# Pytorch learning outline

Time：2022-5-14

### 1. Creation and data type of tensor

#### 1.1 what is tensor

Various types of numerical data

Tensor of order 0: scalar, constant scaler

1st order tensor: vector

Second order tensor: matrix

#### 1.2 creation method of tensor

``````.tensor() .empty() .ones() .zeros() .rand() .randint() .randn() .ones_like()
``````

#### 1.3 data type of tensor

Take the type T1 of tensor dtype

``````torch.float == torch.float32
torch.double == torch.float64
torch.long == torch.int64
torch.int == torch.int32
torch.uint8
``````

#### 1.4 use encapsulated functions to create

``````torch.Tensor()
torch.FloatTensor()
torch.DoubleTensor()
torch.IntTensor()
``````

#### 1.5 data type conversion

``````t1.long()
t1.long().dtype
``````

### 2. Method of tensor

#### 2.1 shape of acquired data

``````t1.shape
t1.size()
t1.shape
t1.size(0)
``````

#### 2.2 types of data obtained

``````t1.dtype
``````

#### 2.3 data type conversion

``````t1.float()
``````

#### 2.4 quick value of single tensor

``````t1.item()
``````

#### 2.5 tensor to numpy

``````t1.numpy()
``````

#### 2.6 changing shape

``````t1.view()
``````

#### 2.7 obtaining order

``````t1.dim()
``````

#### 2.8 evaluation

``````t1.max()
t1.min()
t1.mean()
t1.std()
.sqrt()
.pow(2)
.sum()
``````

#### 2.9 underlined methods

``t1. add_ () local modification``

#### 2.10 shaft exchange

``````t1.permute(1, 0, 2)
``````

### 3. Slice of tensor

``````t1 = torch.rand(3, 4)
print(t1)
print(t1[1, 1])

t1[1, 1] = 100
print(t1[1, :])
print(t1[:, :])

t2 = torch.rand(3, 4，2)
print(t1[0, :，0])
``````

### 4. Use GPU operation

Determine whether CUDA is available`torch.cuda.is_available()`

Create a CUDA device`torch.device('cuda')`

Push data to the device for calculation`t1.to(device)`

Push data and convert type at the same time`t1.to('cpu', torch.double)`

#### 5.1 environmental installation

``````pip install -i https://mirrors.aliyun.com/pypi/simple/ tqdm

pip install matplotlib
pip install opencv-python
pip install fastapi
pip install uvicorn
``````

#### 5.2 basic steps of image recognition

Prepare data

Construction of model

Model training

Model saving

Model evaluation

#### 5.3 dataset classification: pytoch’s own dataset and user-defined dataset

torch.utils.data.Dataset

Two methods that must be implemented after inheriting the dataset class`__getitem__` `__len__`

#### 5.5 introduction to MNIST related parameters

Training set 60000 test set 10000 pictures size 28281

Training set verification set test set

#### 5.6 introduction to relevant parameters of dataloader

torchvision. utils. make_ The grid image is packaged into a grid for display
torchvision. transforms. What did totensor () do?
Generally, the pictures read in are h * w * C, which pytoch can’t handle.
Totensor processes the picture data into c * h * W, so that pytoch can process it.
However, when using other libraries to display pictures, it needs to be processed into h * w * C.

#### 5.7 image enhancement

It refers to operations such as zooming, stretching, rotating and standardizing the picture
It is equivalent to increasing the number of samples, reducing over fitting and improving the generalization ability of the model

``````from torchvision
import transforms
transform = transforms. Compose ([# package)
transforms. Topilimage(), # convert to PIL picture
transforms. Resize (size), # zoom to target size h w
transforms. Totensor () # into tensor
transforms.Normalize(mean = (0.1307, ), std = (0.3081, ))
]) # standardized treatment
transforms. Random rotation # random rotation
transforms. Randomhorizontalflip # random horizontal flip
transforms. Randomverticalflip # random vertical flip
transforms. Randomresizedcrop # randomly intercepts a part``````

### 6. Full connection layer image.png image.png image.png image.png image.png

### 7. Loss function

#### 7.1 mean square error is generally used for regression tasks

``````nn.MSELoss()

torch.nn.functional.mse_loss(input, target)
``````

#### 7.2 cross entropy loss classification task

``````nn.CrossEntropyLoss()

import torch.nn.functional as F

input = F.log_softmax(out, dim=-1)
F.nll_loss(input, target)
``````

#### 7.4 amount of information

Shannon, the founder of information, believes that “information is something used to eliminate random uncertainty”, that is, measuring the amount of information depends on the degree to which the information eliminates uncertainty.

“The sun rises in the East” does not reduce the uncertainty, because the sun must rise in the East. This is nonsense. The amount of information is 0.

“The Chinese team successfully entered the world cup in 2021”, intuitively, this sentence has a lot of information. Because there are great uncertainties for the Chinese team to enter the world cup, and this sentence eliminates the uncertainty of entering the world cup, by definition, this sentence has a large amount of information.

According to the above, it can be summarized as follows: the amount of information is inversely proportional to the probability of information occurrence. The greater the probability, the smaller the amount of information. The smaller the probability, the greater the amount of information. Suppose the probability of an event is p (x), and its information is expressed as:

I(x) = -log(P(x))

Where I (x) represents the amount of information and log represents the natural logarithm based on E.

#### 7.5 information entropy

Information entropy, also known as entropy, is used to represent the expectation of all information. Expectation is the probability of each possible result in the test multiplied by the sum of its results. Therefore, the entropy of information can be expressed as: (here x xx is a discrete random variable) image.png image.png

#### 7.6 relative entropy (KL divergence)

If there are two separate probability distributions P (x) and Q (x) for the same random variable x xx, we can use KL divergence to measure the difference between the two probability distributions. image.png

In machine learning, P (x) is often used to represent the real distribution of samples, and Q (x) is used to represent the distribution predicted by the model. The smaller the KL divergence, the closer the distribution of P (x) and Q (x). The distribution of Q (x) can be approximated to P (x) by repeatedly training Q (x).

#### 7.7 cross entropy

First, disassemble the KL divergence formula: image.png image.png

The former H (P (x)) represents information entropy, the latter is cross entropy, KL divergence = cross entropy – information entropy.

In machine learning training, the input data and labels are often determined, and the real probability distribution P (x) is determined, so the information entropy is a constant. Since the value of KL divergence represents the difference between the real probability distribution P (x) and the predicted probability distribution Q (x), the smaller the value, the better the predicted result. Therefore, it is necessary to minimize KL divergence, and the cross entropy is equal to KL divergence plus a constant (information entropy), and the formula is easier to calculate than KL divergence. Therefore, in machine learning, the cross entropy loss function is often used to calculate loss.

### 8.softmax image.png image.png

### 9. Convolutional network

#### 9.1 difference between traditional network and convolutional network image.png
1. The shape of the input data is different
2. Different parameters

#### 9.2 overall structure of convolution network image.png

Convolution layer BN layer activation function pool layer FC

#### 9.3 convolutional network

``````1. Convolution process
2. Eigenvalue calculation
3. Parameter sharing
4. Convolution parameter calculation kernel_ size * in_ channels + out_ channels
5. Convolution kernel
kernel_size
1 convolution check. Each channel will be convoluted. The convolution kernel weight of the same channel is the same, and the convolution kernel weight of different channels is different
In the same convolution layer, the size of convolution kernel is the same. In different convolution layers, the size of convolution kernel can be different
The size and number of convolution kernels are determined by yourself
How many convolution kernels are used to extract features will produce how many out_channels, and the number of feature maps is equal to the number of output channels
6. Step stripe

#### 9.4 convolution process image.png image.png image.png image.png
##### 9.4.1 number of characteristic drawings image.png
##### 9.4.2 step size image.png
##### 9.4.3 dimension calculation of characteristic drawing image.png
##### 9.4.4 convolutional network
``````nn.Conv2d()
def __init__(self, in_channels, out_channels, kernel_size, stride=2,

nn.MaxPool2d()
F.max_pool2d(input, self.kernel_size, self.stride,
self.return_indices)
``````
##### 9.4.5 maximum pooling image.png
##### 9.4.6 RNN image.png image.png image.png image.png

#### 9.5 LSTM image.png
##### 9.5.2 shape of input data and output data of LSTM

Default input shape:

``````[time_step, batch_size, input] --> [time_step, batch_size, hidden]
``````

batch_first：

``````[batch_size, time_step, input] --> [batch_size, time_step, hidden]
``````

If bidirectional = true, the shape of the output data [batch_size, time_step, hidden * 2]

``````nn.LSTM(input_size = input_dim, hidden_size = hidden_dim, num_layers = layer_dim, batch_first = True, bidirectional = True)
`````` image.png

#### 9.6 BLSTM image.png

### 10. Classical neural network

#### 10.1 classical neural network

``````AlexNet
VGG16 VGG19
ResNet
``````

#### 10.3 RESNET network details

``````https://blog.csdn.net/qq_41760767/article/details/97917419
https://zhuanlan.zhihu.com/p/79378841
``````

#### 10.4 CTCLoss

##### 10.4.1 CTC

The Chinese name is “connection timing classification”. This method mainly solves the problem of misalignment between neural network label and output. Its advantage is that there is no forced alignment of labels and the labels can be long. This method is mainly used in text recognition, speech recognition and so on.

##### 10.4.2 get ctcloss() object
``````ctc_loss = nn.CTCLoss(blank=0)
Blank: the index value in the label where the blank label is located. The default value is 0. It needs to be set according to the actual label definition;``````
##### 10.4.3 call the ctcloss() object in the iteration to calculate the loss value
``````loss = ctc_loss(inputs, targets, input_lengths, target_lengths)
``````

#### 10.5 ctcloss parameter analysis

##### 10.5.1 inputs

Shape is the model output tensor of (T, N, c), where T represents time_ Step, n means batch_ Size, C indicates the length of the code table containing blank. Inputs generally need to go through torch nn. functional. log_ Softmax is processed and then sent to ctcloss.

##### 10.5.2 targets

Shape is a tensor of (n, s) or (sum (target_lengths)), where the first type is n, and N represents batch_ Size, s is the label length. The second type is the sum of all tag lengths, but it should be noted that targets cannot contain blank tags.

##### 10.5.3 input_lengths

Shape is a tensor or tuple of (n), but the length of each element must be equal to t, that is, time_ step。

##### 10.5.4 target_lengths

The shape is a tensor or tuple of (n). The length of the real label does not include blank. The label length can be changed.

## Acwing shortest Hamilton distance (pressure DP)

Title Description Given onen”>nWeighted undirected graph of points from0∼n−1″>0∼ n − 1 label, find the starting point0″>0To the endn−1″>n−1The shortest Hamiltonian path. Hamilton path is defined from0″>0Ton−1″>n−1Pass each point exactly once without repetition and leakage. Input format Enter an integer on the first linen”>n。 Nextn”>nLine per linen”>N # integers, where thei”>iLine numberj”>J) an integer representing […]