Resolved: Couldn't we use L2-norm also for classification?
Good morning. Taking the example of the dog you showed, where t=(0,1,0) and y=(0.4, 0.4, 0.2), why do we use cross-entropy and not L2-norm also with classification? In this case L2-norm would give (0-0.4)^2 + (1-0.4)^2 + (0-0.2)^2 = 0.56, while in the example of the horse it would give (0-0.1)^2 + (0-0.2)^2 + (1-0.7)^2 = 0.14, that is smaller than the previous one as expected. So why do we need to use cross-entropy if actually L2-norm can perform the same task also with classification? Thank you
Hi,
Let me start by posing a question myself: why do cargo trains exist if trucks could do the same thing, if not more (they are not limited by tracks)?
In theory, the L2 loss function can be applied to classification, yes. However, it was not created with this problem in mind. A tool that was developed specifically for classification and probability problems is the cross-entropy loss. So, in every respect, cross-entropy will outperform L2 norm on classification.
Cross-entropy is simply the better tool for this job. And it's not like they are harder to use or incorporate into our program - both loss functions can be define simply by specifying their name in TensorFlow.
Hope this clears it up!
Best,
Nikola, 365 Team