1. Creation and data type of tensor
1.1 what is tensor
Various types of numerical data
Tensor of order 0: scalar, constant scaler
1st order tensor: vector
Second order tensor: matrix
1.2 creation method of tensor
.tensor() .empty() .ones() .zeros() .rand() .randint() .randn() .ones_like()
1.3 data type of tensor
Take the type T1 of tensor dtype
torch.float == torch.float32 torch.double == torch.float64 torch.long == torch.int64 torch.int == torch.int32 torch.uint8
1.4 use encapsulated functions to create
torch.Tensor() torch.FloatTensor() torch.DoubleTensor() torch.IntTensor()
1.5 data type conversion
2. Method of tensor
2.1 shape of acquired data
t1.shape t1.size() t1.shape t1.size(0)
2.2 types of data obtained
2.3 data type conversion
2.4 quick value of single tensor
2.5 tensor to numpy
2.6 changing shape
2.7 obtaining order
t1.max() t1.min() t1.mean() t1.std() .sqrt() .pow(2) .sum()
2.9 underlined methods
t1. add_ () local modification
2.10 shaft exchange
t1.permute(1, 0, 2)
3. Slice of tensor
t1 = torch.rand(3, 4) print(t1) print(t1[1, 1]) t1[1, 1] = 100 print(t1[1, :]) print(t1[:, :]) t2 = torch.rand(3, 4，2) print(t1[0, :，0])
4. Use GPU operation
Determine whether CUDA is available
Create a CUDA device
Push data to the device for calculation
Push data and convert type at the same time
5. Data loading
5.1 environmental installation
pip install -i https://mirrors.aliyun.com/pypi/simple/ tqdm pip install matplotlib pip install opencv-python pip install fastapi pip install uvicorn
5.2 basic steps of image recognition
Construction of model
5.3 dataset classification: pytoch’s own dataset and user-defined dataset
5.4 dataset class and data loading class
Two methods that must be implemented after inheriting the dataset class
5.5 introduction to MNIST related parameters
Training set 60000 test set 10000 pictures size 28281
Training set verification set test set
5.6 introduction to relevant parameters of dataloader
torchvision. utils. make_ The grid image is packaged into a grid for display
torchvision. transforms. What did totensor () do?
Generally, the pictures read in are h * w * C, which pytoch can’t handle.
Totensor processes the picture data into c * h * W, so that pytoch can process it.
However, when using other libraries to display pictures, it needs to be processed into h * w * C.
5.7 image enhancement
It refers to operations such as zooming, stretching, rotating and standardizing the picture
It is equivalent to increasing the number of samples, reducing over fitting and improving the generalization ability of the model
from torchvision import transforms transform = transforms. Compose ([# package) transforms. Topilimage(), # convert to PIL picture transforms. Resize (size), # zoom to target size h w transforms. Totensor () # into tensor transforms.Normalize(mean = (0.1307, ), std = (0.3081, )) ]) # standardized treatment transforms. Random rotation # random rotation transforms. Randomhorizontalflip # random horizontal flip transforms. Randomverticalflip # random vertical flip transforms. Randomresizedcrop # randomly intercepts a part
6. Full connection layer
7. Loss function
7.1 mean square error is generally used for regression tasks
nn.MSELoss() torch.nn.functional.mse_loss(input, target)
7.2 cross entropy loss classification task
nn.CrossEntropyLoss() import torch.nn.functional as F input = F.log_softmax(out, dim=-1) F.nll_loss(input, target)
7.3 CTCLoss nn.CTCLoss()
7.4 amount of information
Shannon, the founder of information, believes that “information is something used to eliminate random uncertainty”, that is, measuring the amount of information depends on the degree to which the information eliminates uncertainty.
“The sun rises in the East” does not reduce the uncertainty, because the sun must rise in the East. This is nonsense. The amount of information is 0.
“The Chinese team successfully entered the world cup in 2021”, intuitively, this sentence has a lot of information. Because there are great uncertainties for the Chinese team to enter the world cup, and this sentence eliminates the uncertainty of entering the world cup, by definition, this sentence has a large amount of information.
According to the above, it can be summarized as follows: the amount of information is inversely proportional to the probability of information occurrence. The greater the probability, the smaller the amount of information. The smaller the probability, the greater the amount of information. Suppose the probability of an event is p (x), and its information is expressed as:
I(x) = -log(P(x))
Where I (x) represents the amount of information and log represents the natural logarithm based on E.
7.5 information entropy
Information entropy, also known as entropy, is used to represent the expectation of all information. Expectation is the probability of each possible result in the test multiplied by the sum of its results. Therefore, the entropy of information can be expressed as: (here x xx is a discrete random variable)
7.6 relative entropy (KL divergence)
If there are two separate probability distributions P (x) and Q (x) for the same random variable x xx, we can use KL divergence to measure the difference between the two probability distributions.
In machine learning, P (x) is often used to represent the real distribution of samples, and Q (x) is used to represent the distribution predicted by the model. The smaller the KL divergence, the closer the distribution of P (x) and Q (x). The distribution of Q (x) can be approximated to P (x) by repeatedly training Q (x).
7.7 cross entropy
First, disassemble the KL divergence formula:
The former H (P (x)) represents information entropy, the latter is cross entropy, KL divergence = cross entropy – information entropy.
In machine learning training, the input data and labels are often determined, and the real probability distribution P (x) is determined, so the information entropy is a constant. Since the value of KL divergence represents the difference between the real probability distribution P (x) and the predicted probability distribution Q (x), the smaller the value, the better the predicted result. Therefore, it is necessary to minimize KL divergence, and the cross entropy is equal to KL divergence plus a constant (information entropy), and the formula is easier to calculate than KL divergence. Therefore, in machine learning, the cross entropy loss function is often used to calculate loss.
9. Convolutional network
9.1 difference between traditional network and convolutional network
- The shape of the input data is different
- Different parameters
9.2 overall structure of convolution network
Convolution layer BN layer activation function pool layer FC
9.3 convolutional network
1. Convolution process 2. Eigenvalue calculation 3. Parameter sharing 4. Convolution parameter calculation kernel_ size * in_ channels + out_ channels 5. Convolution kernel kernel_size 1 convolution check. Each channel will be convoluted. The convolution kernel weight of the same channel is the same, and the convolution kernel weight of different channels is different In the same convolution layer, the size of convolution kernel is the same. In different convolution layers, the size of convolution kernel can be different The size and number of convolution kernels are determined by yourself How many convolution kernels are used to extract features will produce how many out_channels, and the number of feature maps is equal to the number of output channels 6. Step stripe 7. Edge padding
9.4 convolution process
9.4.1 number of characteristic drawings
9.4.2 step size
9.4.3 dimension calculation of characteristic drawing
9.4.4 convolutional network
nn.Conv2d() def __init__(self, in_channels, out_channels, kernel_size, stride=2, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros') nn.MaxPool2d() F.max_pool2d(input, self.kernel_size, self.stride, self.padding, self.dilation, self.ceil_mode, self.return_indices)
9.4.5 maximum pooling
9.5.1 LSTM network application scenario
9.5.2 shape of input data and output data of LSTM
Default input shape:
[time_step, batch_size, input] --> [time_step, batch_size, hidden]
[batch_size, time_step, input] --> [batch_size, time_step, hidden]
If bidirectional = true, the shape of the output data [batch_size, time_step, hidden * 2]
nn.LSTM(input_size = input_dim, hidden_size = hidden_dim, num_layers = layer_dim, batch_first = True, bidirectional = True)
10. Classical neural network
10.1 classical neural network
AlexNet VGG16 VGG19 ResNet
10.2 the more layers of the network, the better
10.3 RESNET network details
The Chinese name is “connection timing classification”. This method mainly solves the problem of misalignment between neural network label and output. Its advantage is that there is no forced alignment of labels and the labels can be long. This method is mainly used in text recognition, speech recognition and so on.
10.4.2 get ctcloss() object
ctc_loss = nn.CTCLoss(blank=0) Blank: the index value in the label where the blank label is located. The default value is 0. It needs to be set according to the actual label definition;
10.4.3 call the ctcloss() object in the iteration to calculate the loss value
loss = ctc_loss(inputs, targets, input_lengths, target_lengths)
10.5 ctcloss parameter analysis
Shape is the model output tensor of (T, N, c), where T represents time_ Step, n means batch_ Size, C indicates the length of the code table containing blank. Inputs generally need to go through torch nn. functional. log_ Softmax is processed and then sent to ctcloss.
Shape is a tensor of (n, s) or (sum (target_lengths)), where the first type is n, and N represents batch_ Size, s is the label length. The second type is the sum of all tag lengths, but it should be noted that targets cannot contain blank tags.
Shape is a tensor or tuple of (n), but the length of each element must be equal to t, that is, time_ step。
The shape is a tensor or tuple of (n). The length of the real label does not include blank. The label length can be changed.