ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Softmax and Cross-Entropy with CNN
    MLAI/DeepLearning 2019. 10. 16. 15:56

    1. Overview

    2. Description

    How come two output values add up to one?

    2.1 Softmax function(Normalized exponential function)

    $$f_{j}(z)=\frac{e^{zj}}{\sum_{k}e^{zk}}$$

    Normally, the dog and the cat neurons would have any kind of real values. Applying the softmax function which is written up over there at the top, and that would bring these values to be between zero and one and it would make them add up to 1. Softmax function is a generalization of the logistic function that squashes a K-dimentional vector of arbitrary real values to a K-dimensional vector of real values in the range of 0 to 1 that add up to 1. 

    2.2 Cross-Entropy function

    $$L_{i}=-log\left ( \frac{e^{f_{y_{i}}}}{\sum_{j}e^{f_{j}}} \right )$$

    $$H(p,q)=-\sum_{x}p(x)log\, q(x)$$

    Both of equation is the cross-entropy function which is theresults basically the same, but second one easier to calculate. Mean square error(MSE) function still can be used in convolutional neural networks as cost function like ANN for assessing our network performance and our goal was to minimize the MSE in order to optimize network performance, Cross-entropy function wich is loss function is used to minimize it to optimize convolutional neural network. 

     

    3. Example

    NN1 outperforming across the board when compared to NN2.

    So what errors we can calculate to estimate performance and monitor the performance of our networks.

    3.1 Classification Error

    NN1: $\frac{1}{3}=0.33$

    NN2: $\frac{1}{3}=0.33$

    Classification error, not a good measure, especially for the purposes of backpropagation, because we know NN1 outperform than NN2, it output the same values.

    3.2 Mean squared error

    NN1: 0.25

    NN2: 0.71

    This one is more accurate than the classification error that NN1 has a lower error rate than NN2.

    3.3 Cross-Entropy

    NN1: 0.38

    NN2: 1.06

    The reasons Cross-Entropy is used over MSE are like below:

    Your back-propagation output value is very tiny. So it's much smaller than the actual value that you want. Then the gradient in your gradient descent will be very low at the start point. It's hard to actually start doing something and start moving around and start adjusting those waves and start moving in the right direction. Whereas when you use something like the Cross-Entropy because it's got that logarithm in it, it actually helps the networks assess even a small error like that and do something about it. 

    If you see improvement using Cross-Entropy instead of MSE, you can see changes significantly than MSE which is very small so it won't guide your gradient boosting process or your backpropagation in the right direction. But Cross-Entropy is the preferred method only for classification. So if you're talking about things like regression, like which we had in an artificial neural network, then you would rather go with MSE, whereas Cross-Entropy is better for classification and again it has to do with the fact that we're using the softmax function.

    4. Reference

    https://peterroelants.github.io/posts/cross-entropy-softmax/

    https://www.youtube.com/watch?v=mlaLLQofmR8

    https://deepnotes.io/softmax-crossentropy

    https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

    https://en.wikipedia.org/wiki/Softmax_function

    https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax-crossentropy

    댓글

Designed by Tistory.