-
Softmax and Cross-Entropy with CNNMLAI/DeepLearning 2019. 10. 16. 15:56
1. Overview
2. Description
How come two output values add up to one?
2.1 Softmax function(Normalized exponential function)
$$f_{j}(z)=\frac{e^{zj}}{\sum_{k}e^{zk}}$$
Normally, the dog and the cat neurons would have any kind of real values. Applying the softmax function which is written up over there at the top, and that would bring these values to be between zero and one and it would make them add up to 1. Softmax function is a generalization of the logistic function that squashes a K-dimentional vector of arbitrary real values to a K-dimensional vector of real values in the range of 0 to 1 that add up to 1.
2.2 Cross-Entropy function
$$L_{i}=-log\left ( \frac{e^{f_{y_{i}}}}{\sum_{j}e^{f_{j}}} \right )$$
$$H(p,q)=-\sum_{x}p(x)log\, q(x)$$
Both of equation is the cross-entropy function which is theresults basically the same, but second one easier to calculate. Mean square error(MSE) function still can be used in convolutional neural networks as cost function like ANN for assessing our network performance and our goal was to minimize the MSE in order to optimize network performance, Cross-entropy function wich is loss function is used to minimize it to optimize convolutional neural network.
3. Example
NN1 outperforming across the board when compared to NN2.
So what errors we can calculate to estimate performance and monitor the performance of our networks.
3.1 Classification Error
NN1: $\frac{1}{3}=0.33$
NN2: $\frac{1}{3}=0.33$
Classification error, not a good measure, especially for the purposes of backpropagation, because we know NN1 outperform than NN2, it output the same values.
3.2 Mean squared error
NN1: 0.25
NN2: 0.71
This one is more accurate than the classification error that NN1 has a lower error rate than NN2.
3.3 Cross-Entropy
NN1: 0.38
NN2: 1.06
The reasons Cross-Entropy is used over MSE are like below:
Your back-propagation output value is very tiny. So it's much smaller than the actual value that you want. Then the gradient in your gradient descent will be very low at the start point. It's hard to actually start doing something and start moving around and start adjusting those waves and start moving in the right direction. Whereas when you use something like the Cross-Entropy because it's got that logarithm in it, it actually helps the networks assess even a small error like that and do something about it.
If you see improvement using Cross-Entropy instead of MSE, you can see changes significantly than MSE which is very small so it won't guide your gradient boosting process or your backpropagation in the right direction. But Cross-Entropy is the preferred method only for classification. So if you're talking about things like regression, like which we had in an artificial neural network, then you would rather go with MSE, whereas Cross-Entropy is better for classification and again it has to do with the fact that we're using the softmax function.
4. Reference
https://peterroelants.github.io/posts/cross-entropy-softmax/
https://www.youtube.com/watch?v=mlaLLQofmR8
https://deepnotes.io/softmax-crossentropy
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
https://en.wikipedia.org/wiki/Softmax_function
https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax-crossentropy
'MLAI > DeepLearning' 카테고리의 다른 글
Boltzmann Machine with Energy-Based Models and Restricted Boltzmann machines(RBM) (0) 2019.10.19 Classify Deep Learning (0) 2019.10.16 Artificial neural network(ANN) (0) 2019.10.05 Convolutional Neural Networks(CNN) (0) 2019.09.30 Difference between Deep Learning and Shallow learning (0) 2019.09.25