Convolutional neural network CNN learning 1


Convolutional neural network CNN learning 1


Ten years of sharpening a sword, the frost blade has never been tried.


Introduction:Convolutional neural network CNN learning.

CNN Chinese video learning link:Convolutional neural network working principle Video – Chinese version

CNN English learning link:Convolutional neural network working principle video

1、 Definition

Convolutional neural networks is a deep learning model or multilayer perceptron similar to artificial neural network, which is often used to analyze visual images. Its founder, Yann Lecun, is the first person to use convolutional neural networks inMNIST datasetSomeone who works on handwritten numbers.

2、 CNN inspiration?

The principle of human vision is as follows: first, the original signal is taken in (the pupil takes in pixels), then the preliminary processing is done (some cells in the cerebral cortex find the edge and direction), then the abstraction is done (the brain determines that the shape of the object in front of us is round), and then the further abstraction is done (the brain further determines that the object is a balloon).
Examples of face recognition by human brain

For different objects, human vision is also through this layer by layer classification, to carry out the following cognition.

 Human visionIdentification examples

At the bottom level, the features are basically similar, that is, the higher the edges, the more features (wheels, eyes, trunk, etc.) of such objects can be extracted. At the top level, different high-level features are finally combined into corresponding images, so that people can accurately distinguish different objects. Therefore, it imitates the characteristics of human brain, constructs a multi-layer neural network, recognizes the primary image features at the lower level, and composes a higher level features with a number of lower level features. Finally, through the combination of multiple levels, it makes the classification at the top level.

3、 What does convolution neural network solve?

Generally speaking, it is to preserve the image features, reduce the dimension of parameters and simplify the complex parameters.

Image pixel RGB

As we all know, the image is composed of pixels, and each pixel is composed of color.Now any picture is more than 1000 × 1000 pixels, and each pixel has three RGB parameters to represent the color information.

If we process a 1000 × 1000 pixel image, we need to process 3 million parameters! 1000×1000×3=3,000,000

It is very resource consuming to process such a large amount of data. The first problem solved by CNN is “convolutional neural network”Simplify complex problemsFirstly, a large number of parameters are reduced to a small number of parameters, and then processed.

More importantly, in most scenarios, dimensionality reduction does not affect the results. For example, if a 1000 pixel image is reduced to 200 pixels, it doesn’t affect whether the naked eye recognizes a cat or a dog in the image, and so does the machine.

The traditional way of picture digitization

Simple digital image can not retain image features, such as the figure above, if there is a circle is 1, no circle is 0, then different positions of the circle will produce completely different data expression. But from the visual point of view, the content (essence) of the image has not changed, just the position has changed. So when we move the object in the image, the parameters obtained in the traditional way will be very different! This is not in line with the requirements of image processing. And CNN To solve this problem, he used a vision like way [imitating the principle of human brain vision, constructing a multi-layer neural network, recognizing the primary image features at the lower level, combining several lower level features to form a higher level feature, and finally making classification at the top level through the combination of multiple levels] to retain the image features, when the image is flipped, rotated or changed position, It can also effectively identify similar images.

4、 Architecture of convolutional neural network

A typical CNN consists of three parts: convolution layer, pooling layer and fully connected layer

Typical CNN components

There are also five more perfect hierarchical structures as follows:
1. Data input layer
2. Convolution layer: conv layer
3. Relu layer: relu layer
4. Pooling layer
5. Full connection layer: FC layer

5、 Data input layer

The data input layer is mainly used to preprocess the original graphics data, including de averaging, normalization, PCA / whitening.
To remove the mean value:
Centralize all dimensions of the input data to 0, as shown in the figure below. The purpose is to pull the center of the sample back to the origin of the coordinate system.

The amplitude is normalized to the same range, as shown below. That is to reduce the interference caused by the difference in the value range of each dimension data. For example, we have two dimensions of features a and B, the range of a is 0 to 10, and the range of B is 0 to 10000. If we use these two features directly, it is problematic. A good way is to normalize, that is, the data of a and B become the range of 0 to 1.
PCA / albino
Using PCA to reduce dimension, whitening is to normalize the amplitude of each feature axis of data.
De mean & normalized rendering

Decorrelation and whitening effect

6、 Convolution computing layer

There are two important operations in convolution layer, one is local correlation, and each neuron is regarded as a filter; the other is window sliding, and the filter calculates the local data.

The operation process of convolution layer is shown in the following figure. Scan the whole picture with a convolution core

Convolution layer dynamic operation graph

The operation process of convolution layer can be regarded as using a filter (convolution kernel) to filter each small area of the image, so as to get the eigenvalues of these small areas, that is, the convolution layer extracts the local features of the image through the convolution kernel filtering.

Convolution layer operation graph


Dynamic graph of convolution layer calculation process

7、 Incentive layer

The incentive level isThe output of the convolution layer is mapped nonlinearly. The excitation function used by CNN is generally relu (the rectified linear unit / modified linear unit).

8、 Pooling layer

Pooling layer is in the middle of continuous convolution layer, which is used to compress the amount of data and parameters and reduce over fitting. The most important role is to keep the characteristics unchanged, compress the image and reduce the data dimension.

      The methods of pooling layer are Max pooling and average pooling, but the actual method is Max pooling. Max pooling idea: for each 2 * 2 window, select the maximum number as the value of the corresponding element of the output matrix. For example, if the maximum number in the first 2 * 2 window of the input matrix is 6, then the first element of the output matrix is 6, and so on, keep the feature unchanged and reduce the dimension.

Max pooling diagram

Dynamic pooling diagram

We can see that in the dynamic pooling diagram,Original pictureIt’s 20 × 20, and we do down sampling,Sampling windowFinally, it is downsampled into a 2 × 2 size imageFeature map

9、 Fully connected layer

The full connection layer is at the end of the convolutional neural network, which is responsible for the output results and is the last step. It is the same as the traditional neural network.
Full connection layer connection mode


10、 CNN application scenarios

Image classification / retrieval, target location detection, target segmentation, face recognition, bone recognition, etc.

CNN face recognition


CNN Chinese video learning link:Convolutional neural network working principle Video – Chinese version

CNN English learning link:Convolutional neural network working principle video



Ten years to sharpen a sword

Frost blade never tried