Pytorch learning outline


1. Creation and data type of tensor

1.1 what is tensor

Various types of numerical data

Tensor of order 0: scalar, constant scaler

1st order tensor: vector

Second order tensor: matrix

1.2 creation method of tensor

.tensor() .empty() .ones() .zeros() .rand() .randint() .randn() .ones_like()

1.3 data type of tensor

Take the type T1 of tensor dtype

torch.float == torch.float32 
torch.double == torch.float64
torch.long == torch.int64 == torch.int32 

1.4 use encapsulated functions to create


1.5 data type conversion


2. Method of tensor

2.1 shape of acquired data


2.2 types of data obtained


2.3 data type conversion


2.4 quick value of single tensor


2.5 tensor to numpy


2.6 changing shape


2.7 obtaining order


2.8 evaluation


2.9 underlined methods

t1. add_ () local modification

2.10 shaft exchange

t1.permute(1, 0, 2)

3. Slice of tensor

t1 = torch.rand(3, 4)
print(t1[1, 1])

t1[1, 1] = 100
print(t1[1, :])
print(t1[:, :])

t2 = torch.rand(3, 4,2)
print(t1[0, :,0])

4. Use GPU operation

Determine whether CUDA is availabletorch.cuda.is_available()

Create a CUDA devicetorch.device('cuda')

Push data to the device for

Push data and convert type at the same'cpu', torch.double)

5. Data loading

5.1 environmental installation

pip install -i tqdm

pip install matplotlib
pip install opencv-python
pip install fastapi
pip install uvicorn

5.2 basic steps of image recognition

Prepare data

Construction of model

Model training

Model saving

Model evaluation

5.3 dataset classification: pytoch’s own dataset and user-defined dataset

5.4 dataset class and data loading class

Two methods that must be implemented after inheriting the dataset class__getitem__ __len__

5.5 introduction to MNIST related parameters

Training set 60000 test set 10000 pictures size 28281

Training set verification set test set

5.6 introduction to relevant parameters of dataloader

torchvision. utils. make_ The grid image is packaged into a grid for display
torchvision. transforms. What did totensor () do?
Generally, the pictures read in are h * w * C, which pytoch can’t handle.
Totensor processes the picture data into c * h * W, so that pytoch can process it.
However, when using other libraries to display pictures, it needs to be processed into h * w * C.

5.7 image enhancement

It refers to operations such as zooming, stretching, rotating and standardizing the picture
It is equivalent to increasing the number of samples, reducing over fitting and improving the generalization ability of the model

from torchvision
import transforms
transform = transforms. Compose ([# package)
    transforms. Topilimage(), # convert to PIL picture
    transforms. Resize (size), # zoom to target size h w
    transforms. Totensor () # into tensor
    transforms.Normalize(mean = (0.1307, ), std = (0.3081, ))
]) # standardized treatment
transforms. Random rotation # random rotation
transforms. Randomhorizontalflip # random horizontal flip
transforms. Randomverticalflip # random vertical flip
transforms. Randomresizedcrop # randomly intercepts a part

6. Full connection layer

Pytorch learning outline


Pytorch learning outline

Pytorch learning outline

Pytorch learning outline

Pytorch learning outline


7. Loss function

7.1 mean square error is generally used for regression tasks


torch.nn.functional.mse_loss(input, target) 

7.2 cross entropy loss classification task


import torch.nn.functional as F

input = F.log_softmax(out, dim=-1)
F.nll_loss(input, target)

7.3 CTCLoss nn.CTCLoss()

7.4 amount of information

Shannon, the founder of information, believes that “information is something used to eliminate random uncertainty”, that is, measuring the amount of information depends on the degree to which the information eliminates uncertainty.

“The sun rises in the East” does not reduce the uncertainty, because the sun must rise in the East. This is nonsense. The amount of information is 0.

“The Chinese team successfully entered the world cup in 2021”, intuitively, this sentence has a lot of information. Because there are great uncertainties for the Chinese team to enter the world cup, and this sentence eliminates the uncertainty of entering the world cup, by definition, this sentence has a large amount of information.

According to the above, it can be summarized as follows: the amount of information is inversely proportional to the probability of information occurrence. The greater the probability, the smaller the amount of information. The smaller the probability, the greater the amount of information. Suppose the probability of an event is p (x), and its information is expressed as:

I(x) = -log(P(x))

Where I (x) represents the amount of information and log represents the natural logarithm based on E.

7.5 information entropy

Information entropy, also known as entropy, is used to represent the expectation of all information. Expectation is the probability of each possible result in the test multiplied by the sum of its results. Therefore, the entropy of information can be expressed as: (here x xx is a discrete random variable)

Pytorch learning outline


Pytorch learning outline


7.6 relative entropy (KL divergence)

If there are two separate probability distributions P (x) and Q (x) for the same random variable x xx, we can use KL divergence to measure the difference between the two probability distributions.

Pytorch learning outline


In machine learning, P (x) is often used to represent the real distribution of samples, and Q (x) is used to represent the distribution predicted by the model. The smaller the KL divergence, the closer the distribution of P (x) and Q (x). The distribution of Q (x) can be approximated to P (x) by repeatedly training Q (x).

7.7 cross entropy

First, disassemble the KL divergence formula:

Pytorch learning outline


Pytorch learning outline


The former H (P (x)) represents information entropy, the latter is cross entropy, KL divergence = cross entropy – information entropy.

In machine learning training, the input data and labels are often determined, and the real probability distribution P (x) is determined, so the information entropy is a constant. Since the value of KL divergence represents the difference between the real probability distribution P (x) and the predicted probability distribution Q (x), the smaller the value, the better the predicted result. Therefore, it is necessary to minimize KL divergence, and the cross entropy is equal to KL divergence plus a constant (information entropy), and the formula is easier to calculate than KL divergence. Therefore, in machine learning, the cross entropy loss function is often used to calculate loss.


Pytorch learning outline


Pytorch learning outline


9. Convolutional network

9.1 difference between traditional network and convolutional network

Pytorch learning outline

  1. The shape of the input data is different
  2. Different parameters

9.2 overall structure of convolution network

Pytorch learning outline


Convolution layer BN layer activation function pool layer FC

9.3 convolutional network

1. Convolution process
2. Eigenvalue calculation
3. Parameter sharing
4. Convolution parameter calculation kernel_ size * in_ channels + out_ channels
5. Convolution kernel
    1 convolution check. Each channel will be convoluted. The convolution kernel weight of the same channel is the same, and the convolution kernel weight of different channels is different
    In the same convolution layer, the size of convolution kernel is the same. In different convolution layers, the size of convolution kernel can be different
    The size and number of convolution kernels are determined by yourself
    How many convolution kernels are used to extract features will produce how many out_channels, and the number of feature maps is equal to the number of output channels
 6. Step stripe
 7. Edge padding

9.4 convolution process

Pytorch learning outline

Pytorch learning outline

Pytorch learning outline

Pytorch learning outline

9.4.1 number of characteristic drawings
Pytorch learning outline

9.4.2 step size
Pytorch learning outline

9.4.3 dimension calculation of characteristic drawing
Pytorch learning outline

9.4.4 convolutional network
def __init__(self, in_channels, out_channels, kernel_size, stride=2,
             padding=0, dilation=1, groups=1,
             bias=True, padding_mode='zeros')

F.max_pool2d(input, self.kernel_size, self.stride,
                            self.padding, self.dilation, self.ceil_mode,
9.4.5 maximum pooling
Pytorch learning outline

9.4.6 RNN
Pytorch learning outline

Pytorch learning outline

Pytorch learning outline

Pytorch learning outline


9.5 LSTM

Pytorch learning outline

9.5.1 LSTM network application scenario
9.5.2 shape of input data and output data of LSTM

Default input shape:

[time_step, batch_size, input] --> [time_step, batch_size, hidden]


[batch_size, time_step, input] --> [batch_size, time_step, hidden]

If bidirectional = true, the shape of the output data [batch_size, time_step, hidden * 2]

nn.LSTM(input_size = input_dim, hidden_size = hidden_dim, num_layers = layer_dim, batch_first = True, bidirectional = True)
Pytorch learning outline



Pytorch learning outline


10. Classical neural network

10.1 classical neural network


10.2 the more layers of the network, the better

10.3 RESNET network details

10.4 CTCLoss

10.4.1 CTC

The Chinese name is “connection timing classification”. This method mainly solves the problem of misalignment between neural network label and output. Its advantage is that there is no forced alignment of labels and the labels can be long. This method is mainly used in text recognition, speech recognition and so on.

10.4.2 get ctcloss() object
ctc_loss = nn.CTCLoss(blank=0)
Blank: the index value in the label where the blank label is located. The default value is 0. It needs to be set according to the actual label definition;
10.4.3 call the ctcloss() object in the iteration to calculate the loss value
loss = ctc_loss(inputs, targets, input_lengths, target_lengths)

10.5 ctcloss parameter analysis

10.5.1 inputs

Shape is the model output tensor of (T, N, c), where T represents time_ Step, n means batch_ Size, C indicates the length of the code table containing blank. Inputs generally need to go through torch nn. functional. log_ Softmax is processed and then sent to ctcloss.

10.5.2 targets

Shape is a tensor of (n, s) or (sum (target_lengths)), where the first type is n, and N represents batch_ Size, s is the label length. The second type is the sum of all tag lengths, but it should be noted that targets cannot contain blank tags.

10.5.3 input_lengths

Shape is a tensor or tuple of (n), but the length of each element must be equal to t, that is, time_ step。

10.5.4 target_lengths

The shape is a tensor or tuple of (n). The length of the real label does not include blank. The label length can be changed.