Pytorch’s softmax cross entropy loss and gradient usage

Time:2020-11-2

In pytorch, the cross entropy loss of softmax and the calculation of input gradient can be easily verified

About softmax_ cross_ You can refer to here for the derivation process of entropy

Examples

# -*- coding: utf-8 -*-
import torch
import torch.autograd as autograd
from torch.autograd import Variable
import torch.nn.functional as F
import torch.nn as nn
import numpy as np

#Gradient the data for back propagation
data = Variable(torch.FloatTensor([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]]), requires_grad=True)

#Multi category label one hot format
label = Variable(torch.zeros((3, 3)))
label[0, 2] = 1
label[1, 1] = 1
label[2, 0] = 1
print(label)

# for batch loss = mean( -sum(Pj*logSj) )
# for one : loss = -sum(Pj*logSj)
loss = torch.mean(-torch.sum(label * torch.log(F.softmax(data, dim=1)), dim=1))

loss.backward()
print(loss, data.grad)

Output:

tensor([[ 0., 0., 1.],
    [ 0., 1., 0.],
    [ 1., 0., 0.]])
#Loss: loss and input's grad
tensor(1.4076) tensor([[ 0.0300, 0.0816, -0.1116],
    [ 0.0300, -0.2518, 0.2217],
    [-0.3033, 0.0816, 0.2217]])

be careful

For single input loss and grad

data = Variable(torch.FloatTensor([[1.0, 2.0, 3.0]]), requires_grad=True)


label = Variable(torch.zeros((1, 3)))
#Let the label of different index positions be 1
label[0, 0] = 1
# label[0, 1] = 1
# label[0, 2] = 1
print(label)

# for batch loss = mean( -sum(Pj*logSj) )
# for one : loss = -sum(Pj*logSj)
loss = torch.mean(-torch.sum(label * torch.log(F.softmax(data, dim=1)), dim=1))

loss.backward()
print(loss, data.grad)

Its output:

#Group 1:
lable: tensor([[ 1., 0., 0.]])
loss: tensor(2.4076) 
grad: tensor([[-0.9100, 0.2447, 0.6652]])

#Group 2:
lable: tensor([[ 0., 1., 0.]])
loss: tensor(1.4076) 
grad: tensor([[ 0.0900, -0.7553, 0.6652]])

#Group 3
lable: tensor([[ 0., 0., 1.]])
loss: tensor(0.4076) 
grad: tensor([[ 0.0900, 0.2447, -0.3348]])

"""
Explanation:
For input data tensor ([[1., 2., 3.]]), the result after softmax is as follows
tensor([[ 0.0900, 0.2447, 0.6652]])
According to the cross entropy derivation formula, s [0, 0] - 1, s [0, 1] - 1, s [0, 2] - 1 are the input data gradients corresponding to the above three groups of labels
"""

Softmax and log provided by Python_ Softmax relation

#Official softmax implementation
In[2]: import torch
 ...: import torch.autograd as autograd
 ...: from torch.autograd import Variable
 ...: import torch.nn.functional as F
 ...: import torch.nn as nn
 ...: import numpy as np
In[3]: data = Variable(torch.FloatTensor([[1.0, 2.0, 3.0]]), requires_grad=True)
In[4]: data
Out[4]: tensor([[ 1., 2., 3.]])
In[5]: e = torch.exp(data)
In[6]: e
Out[6]: tensor([[ 2.7183,  7.3891, 20.0855]])
In[7]: s = torch.sum(e, dim=1)
In[8]: s
Out[8]: tensor([ 30.1929])
In[9]: softmax = e/s
In[10]: softmax
Out[10]: tensor([[ 0.0900, 0.2447, 0.6652]])
In [11]: ා equivalent to softmax provided by Python 
In[12]: org_softmax = F.softmax(data, dim=1)
In[13]: org_softmax
Out[13]: tensor([[ 0.0900, 0.2447, 0.6652]])
In[14]: org_ Softmax = = softmax # the calculation results are the same
Out[14]: tensor([[ 1, 1, 1]], dtype=torch.uint8)

#And log_ Softmax relation
# log_softmax = log(softmax)
In[15]: _log_softmax = torch.log(org_softmax) 
In[16]: _log_softmax
Out[16]: tensor([[-2.4076, -1.4076, -0.4076]])
In[17]: log_softmax = F.log_softmax(data, dim=1)
In[18]: log_softmax
Out[18]: tensor([[-2.4076, -1.4076, -0.4076]])

Soft Max cross entropy solution results provided by the government

# -*- coding: utf-8 -*-
import torch
import torch.autograd as autograd
from torch.autograd import Variable
import torch.nn.functional as F
import torch.nn as nn
import numpy as np

data = Variable(torch.FloatTensor([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]]), requires_grad=True)
log_softmax = F.log_softmax(data, dim=1)

label = Variable(torch.zeros((3, 3)))
label[0, 2] = 1
label[1, 1] = 1
label[2, 0] = 1
print("lable: ", label)

#One of the calculation methods of cross entropy
loss_fn = torch.nn.NLLLoss() # reduce=True loss.sum/batch & grad/batch
#Nllloss input is log_ Softmax, target is a label of non one hot format
loss = loss_fn(log_softmax, torch.argmax(label, dim=1))
loss.backward()
print("loss: ", loss, "\ngrad: ", data.grad)

"""
#Cross entropy calculation method 2
loss_fn = torch.nn.CrossEntropyLoss() # the target label is NOT an one-hotted
#Crossentropy loss function for classification problems
#Input: no softmax nn.output , target is a non one hot label
loss = loss_fn(data, torch.argmax(label, dim=1))
loss.backward()
print("loss: ", loss, "\ngrad: ", data.grad)
"""
"""

output


lable: tensor([[ 0., 0., 1.],
    [ 0., 1., 0.],
    [ 1., 0., 0.]])
loss: tensor(1.4076) 
grad: tensor([[ 0.0300, 0.0816, -0.1116],
    [ 0.0300, -0.2518, 0.2217],
    [-0.3033, 0.0816, 0.2217]])

By comparing with the output of the example, it is found that the two are the same

The soft Max cross entropy loss and gradient usage of pytorch are all the contents shared by the editor. I hope I can give you a reference, and I hope you can support developeppaer more.

Recommended Today

Practice of query operation of database table (Experiment 3)

Following the previous two experiments, this experiment is to master the use of select statements for various query operations: single table query, multi table connection and query, nested query, set query, to consolidate the database query operation.Now follow Xiaobian to practice together!Based on the data table (student, course, SC, teacher, TC) created and inserted in […]