Softmax and Cross-Entropy with CNN

MLAI/DeepLearning 2019. 10. 16. 15:56

1. Overview

2. Description

How come two output values add up to one?

2.1 Softmax function(Normalized exponential function)

$$f_{j}(z)=\frac{e^{zj}}{\sum_{k}e^{zk}}$$

Normally, the dog and the cat neurons would have any kind of real values. Applying the softmax function which is written up over there at the top, and that would bring these values to be between zero and one and it would make them add up to 1. Softmax function is a generalization of the logistic function that squashes a K-dimentional vector of arbitrary real values to a K-dimensional vector of real values in the range of 0 to 1 that add up to 1.

2.2 Cross-Entropy function

$$L_{i}=-log\left ( \frac{e^{f_{y_{i}}}}{\sum_{j}e^{f_{j}}} \right )$$

$$H(p,q)=-\sum_{x}p(x)log\, q(x)$$

Both of equation is the cross-entropy function which is theresults basically the same, but second one easier to calculate. Mean square error(MSE) function still can be used in convolutional neural networks as cost function like ANN for assessing our network performance and our goal was to minimize the MSE in order to optimize network performance, Cross-entropy function wich is loss function is used to minimize it to optimize convolutional neural network.

3. Example

NN1 outperforming across the board when compared to NN2.

So what errors we can calculate to estimate performance and monitor the performance of our networks.

3.1 Classification Error

NN1: $\frac{1}{3}=0.33$

NN2: $\frac{1}{3}=0.33$

Classification error, not a good measure, especially for the purposes of backpropagation, because we know NN1 outperform than NN2, it output the same values.

3.2 Mean squared error

NN1: 0.25

NN2: 0.71

This one is more accurate than the classification error that NN1 has a lower error rate than NN2.

3.3 Cross-Entropy

NN1: 0.38

NN2: 1.06

The reasons Cross-Entropy is used over MSE are like below:

Your back-propagation output value is very tiny. So it's much smaller than the actual value that you want. Then the gradient in your gradient descent will be very low at the start point. It's hard to actually start doing something and start moving around and start adjusting those waves and start moving in the right direction. Whereas when you use something like the Cross-Entropy because it's got that logarithm in it, it actually helps the networks assess even a small error like that and do something about it.

If you see improvement using Cross-Entropy instead of MSE, you can see changes significantly than MSE which is very small so it won't guide your gradient boosting process or your backpropagation in the right direction. But Cross-Entropy is the preferred method only for classification. So if you're talking about things like regression, like which we had in an artificial neural network, then you would rather go with MSE, whereas Cross-Entropy is better for classification and again it has to do with the fact that we're using the softmax function.

4. Reference

https://peterroelants.github.io/posts/cross-entropy-softmax/

https://www.youtube.com/watch?v=mlaLLQofmR8

https://deepnotes.io/softmax-crossentropy

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

https://en.wikipedia.org/wiki/Softmax_function

https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax-crossentropy

저작자표시

'MLAI > DeepLearning' 카테고리의 다른 글

Boltzmann Machine with Energy-Based Models and Restricted Boltzmann machines(RBM) (0)	2019.10.19
Classify Deep Learning (0)	2019.10.16
Artificial neural network(ANN) (0)	2019.10.05
Convolutional Neural Networks(CNN) (0)	2019.09.30
Difference between Deep Learning and Shallow learning (0)	2019.09.25

ABOUT ME

Demyank's Tlog Demyank's Tlog

1. Overview

2. Description