What Is Cross-Entropy Loss Function?

Machine learning, deep learning, and AI are becoming an increasingly important part of our lives. Whether for business strategy or technological advancement, these techniques help us improve our decision-making and future planning. But for that to happen, our models must first have high accuracy. That’s why loss functions are perhaps the most essential part of training your model because they show the accuracy of its performance—especially the cross-entropy loss function.

Aside from cross-entropy loss, there are different types of loss functions, such as L2-norm loss. It’s not, however, a one-size-fits-all situation because not all functions will be compatible with your model. Choosing the right one is vital for training. This article addresses the cross-entropy loss function. We’ll explain what it is, outline the subtypes, and provide a practical example to better understand the fundamentals.

What Is Cross-Entropy Loss?
Types of Cross-Entropy Loss Function
How to Apply the Cross-Entropy Loss Function: A Practical Example
Cross-Entropy Loss Function: Next Steps

What Is Cross-Entropy Loss?

Cross-entropy loss refers to the contrast between two random variables. It measures the variables to extract the difference in the information they contain, showcasing the results.

Before going into detail, however, let’s briefly discuss loss functions. We separate them into two categories based on their outputs:

Regression loss functions
Classification loss functions

Regression and classification are the two categories that make up the principle of supervised learning. And, while the outputs in regression tasks, for example, are numbers, the outputs for classification are categories, like cats and dogs.

We use cross-entropy loss in classification tasks to calculate how accurate our machine learning or deep learning model is by defining the difference between the estimated probability with our desired outcome; it’s the most popular loss function in such cases.

The cross-entropy loss function measures your model’s performance by transforming its variables into real numbers, thereby evaluating the ’loss’ associated with them. The higher the difference between the two, the higher the loss.

We define the cross-entropy loss formula in the following way:

\[\text{Cross-Entropy} = L\left ( \mathbf{y}, \mathbf{t} \right ) = -\sum_{i} \: \mathbf{t}_{i}\; \ln\mathbf{y}_{i}\]

Types of Cross-Entropy Loss Function

We recognize two primary types of the cross-entropy loss function in machine learning and deep learning classification tasks, namely:

Binary cross-entropy loss
Categorical cross-entropy loss

Let’s discover what each loss function entails.

Binary Cross-Entropy Loss

Binary cross-entropy loss is used in binary classification tasks, with only two possible classes or labels: positive and negative or true and false.

This type of cross-entropy loss measures the dissimilarity between the predicted probabilities and the true binary labels. It encourages the model to output higher probabilities for the positive class and lower probabilities for the negative class.

We calculate the binary cross-entropy loss function with the following formula:

\[L\ =\ \frac{1}{N}\ \sum_{i=1}^N(y_i\log(p_i)+(1\ -y_i)\log(1-p_i))\]

The binary cross-entropy loss is commonly used in neural networks with a sigmoid activation function in the output layer. It trains models to successfully distinguish two classes by minimizing the dissimilarity between predicted probabilities and true labels.

Categorical Cross-Entropy Loss

We utilize categorical cross-entropy loss in multi-class classification tasks with more than two mutually exclusive classes. Similarly to the binary, this type of cross-entropy loss function quantifies the dissimilarity between the predicted probabilities and the true categorical labels.

And here’s how we represent the categorical cross-entropy loss formula:

\[L=-\frac{1}{N}\, \sum_{i=1}^{N}=\sum_{j=1}^{C}=y_{ij}\, log(p_{ij})\]

The categorical cross-entropy loss function is commonly used in neural networks with softmax activation in the output layer for multi-class classification tasks. By minimizing loss, the model learns to assign higher probabilities to the correct class while reducing the probabilities for incorrect classes, improving accuracy.

How to Apply the Cross-Entropy Loss Function: А Practical Example

Now that you know what cross-entropy loss function is, let’s learn how it works and how to apply it in practice. We’ll illustrate this through a practical classification example.

Let’s consider three categories:

Cats
Dogs
Horses

These are all labeled images, with the label as the target. But how does it look in numerical terms? The target vector t for this photo would be 0,1,0:

The 0s mean it is not a cat or a horse, while the 0 shows it is, indeed, a dog. If we were to examine a picture of a horse, the target vector will be 0,0,1:

Imagine the outputs of our model for these two images are 0.4, 0.4, 0.2 for the first image and 0.1, 0.2, 0.7 for the second:

After some machine learning transformations, these vectors show the probabilities for each photo: a cat, dog, or horse.

The first vector shows that—according to our algorithm—there is a 0.4 (or 40%) chance that the first photo is a cat. Additionally, there’s a 40% chance the photo is of a dog and 20% a horse.

What about the cross-entropy loss function of each photo? Let’s observe.

The loss for the first image would be:

\[L(y,t) = -0 \times \ln 0.4 -1 \times \ln 0.4 -0 \times \ln 0.2 = 0.92\]

Meanwhile, the cross-entropy loss for the second image is:

\[L(y,t) = -0 \times \ln 0.1 -0 \times \ln 0.2 -1 \times \ln 0.7 = 0.36\]

As we already know, the lower the loss function, the more accurate the model.

So, what’s the meaning of these two cross-entropies? They show the second loss is lower; therefore, its prediction is superior. For the first image, the model was unsure if the photo was of a dog or a cat—there was an equal 40% probability for both options. We can oppose this to the second photo, where the model was 70% sure it was a horse, so the cross-entropy was lower.

An important note is that, with classification, our target vectors consist of 0s and a 1, which indicates the correct category. Therefore, we could simplify the above formulas to minus the log of the probability of the output for the correct answer.

Note an illustration of how our initial formulas would change:

For the image of the dog, we would have:

\[L(y, t) = -1 \times ln 0.4\]

Meanwhile, the horse image would become:

\[L(y, t) = -1 \times ln 0.7\]

Cross-Entropy Loss Function FAQs

Is cross-entropy loss a loss function?

Yes, cross-entropy loss is a loss function used in classification tasks when training a supervised learning algorithm. It’s the most popular loss function for machine learning or deep learning classification. This loss function is typically found in linear classification models like the logistic regression algorithm. We divide the cross-entropy loss function into two subtypes based on what tasks we need it to complete:
• Binary cross-entropy function—used in neural networks with a sigmoid activation function in the output layer
• Categorical cross-entropy function—used in neural networks with softmax activation in the output layer for multi-class classification tasks

Why choose the cross-entropy loss function?

You should choose the popular cross-entropy loss function because it’s easy to implement. You can optimize your algorithms more efficiently. The loss function measures the dissimilarity between predicted probabilities and true labels and encourages probabilistic predictions, enabling nuanced outputs and reflecting the model's confidence. Moreover, the cross-entropy loss handles class imbalance in a model by adjusting the weights or assigning importance to underrepresented classes. And finally, its two subtypes (binary and categorical) are compatible with neural network activation functions like the sigmoid and softmax, which enhances its ability to interpret class probability.

Cross-Entropy Loss Function: Next Steps

Unsurprisingly, cross-entropy loss is the most popular function used in machine learning or deep learning classification. After all, it helps determine the accuracy of a model in numerical values of 0s and 1s, from which we can later extract the probability percentage.

As mentioned, other loss functions can help us resolve a problem. We must emphasize that any function with the fundamental property of being higher for worse results and lower for better results can be a loss function.

Loss functions are essential because they help us improve the accuracy of our models significantly. If you want a better understanding of the cross-entropy loss function and other types, we can help you on your machine learning and deep learning journey. Start with 365 Data Science—your gateway into the world of data, machine learning, and AI.