Cross entropy loss function NN. Cross entropy loss ()



1. Introduction

When using pytorch deep learning framework to do multi classification, the cross entropy loss function NN. Crossentropyloss ()

2. Information quantity and entropy

Amount of information:It is used to measure the uncertainty of an event; The greater the probability of an event, the smaller the uncertainty, the smaller the amount of information it carries. Suppose that $x $is a discrete random variable, and its value set is $x $= $X_ 0,x_ 1,,,x_ N $, whose probability distribution function is $p (x) = pr (x = x), X in x $, then event $x = x is defined_ The amount of information of $0 is:

I(x_i) = -log(p(x_i))

When $p (x)_ 0) = 1 $, the event must happen, and the amount of information is 0.

Entropy is used to measure the degree of chaos in a system, representing the total amount of information in the system; The greater the entropy, the greater the uncertainty of the system.

Information quantity is to measure the uncertainty of an event, while entropy is to measure the uncertainty of a system (all events).

The calculation formula of entropy is as follows

H(x) = -\sum_{i=1}^np(x_i)log(p(x_i))

Where, $p (x)_ i) For event x = x_ The probability of I, – log (P (x_ i) ) is event x = X_ I$
It can be seen that entropy is the expected value of information quantity and the measure of uncertainty of a random variable (a system, all possibilities of events). The larger the entropy is, the more difficult it is to determine the value of random variables and the more unstable the system is; The smaller the entropy is, the easier it is to determine the value of random variables and the more stable the system is.

Cross entropy

Cross entropy is mainly used to determine the closeness between the actual output and the expected output, that is, the smaller the value of cross entropy is, the closer the two probability distributions are. Suppose that the probability distribution P is the expected output, the probability distribution q is the actual output and $H (P, q) $is the cross entropy
(1) Dichotomy

H(p,q) = \frac{1}{N}\sum_{i=1}^N H(p(x_i),q(x_i)) = \frac{1}{N}\sum_{i=1}^N -(p(x_i)log(q(x_i)))+(1-p(x_i))log(1-q(x_i))

Among them:
N is the number of samples in a batch
$p(x_ i) $represents the label of sample I, the positive class is 1, and the negative class is 0
$q(x_ i) $represents the probability that sample I predicts positive

(2) Multiclassification

H(p,q) = \frac{1}{N}\sum_{i=1}^N H(p(x_i),q(x_i)) = – \frac{1}{N}\sum_{i=1}^N \sum_{j=1}^M p(x_{ij})log(q(x_{ij}))

Among them:
M is the number of categories
$p(x_{ Ij}) $represents a variable (0 or 1). If the category is the same as that of sample I, it is 1, otherwise it is 0;
$q(x_{ Ij}) $denotes the prediction probability that the observed sample I belongs to the category $J $.

give an example

x cat dog horse
Label 0 0 1
Pred 0.1 0.1 0.8

Then the loss of a sample is:
$loss= -(0 * log(0.1)+0*log(0.1)+1*log(0.8)) = 0.22$

Cross entropy in Python
In Python, it’s different from the cross entropy that I understand. NN. Crossentropyloss() is the integration of NN. Logsoftmax() and NN. Nllloss(), which can be directly used to replace the two operations in the network. Let’s take a look at the explanation on the official website

Cross entropy loss function NN. Cross entropy loss ()