Introduction notes to Python deep learning 4 CNN

Time:2022-1-23

Introduction notes to Python deep learning 4 CNN

Convolutional neural network

The theme of this part isConvolutional neural network CNN

network structure

Like the neural network introduced before, CNN is also built layer by layer. However, there are new problems in CNNConvolution layerandPooling layer

In the neural network introduced earlier, all neurons in adjacent layers are connected, which is calledFully connectedThe following figure is an example of a network based on the affinity layer:

Introduction notes to Python deep learning 4 CNN

The structure of CNN is usually:

Introduction notes to Python deep learning 4 CNN

In CNN, the connection order of layers is “revolution relu – (pooling)” (the pooling layer is sometimes omitted). This can be understood as that the previous “affiliate relu” connection is replaced by a “revolution relu – (pooling)” connection. However, the part close to the output layer still uses the previous “affiliate relu”, and the output layer also uses “affiliate softmax”

Convolution layer

What’s wrong with the full connection layer?

There is a problem with the full connection layer:The shape of the data is “ignored”For example, when inputting a picture, the usual picture data is three-dimensional data in the direction of “height, length and channel”. However, when inputting to the full connection layer, it is necessary to flatten the three-dimensional data into one-dimensional data.

In the three-dimensional data of the image, there are important spatial information, such as spatially adjacent pixels are similar values, each channel of RBG has close correlation, and there is no correlation between pixels far away, etc,There may be essential patterns worth extracting in 3D shapes。 Because the full connection layer ignores the shape and processes all the input data as the same neurons (neurons of the same dimension), the information related to the shape cannot be used.

The convolution layer can keep the shape unchanged. When the input data is an image, the convolution layer will receive the input data in the form of 3D data and output it to the next layer in the form of 3D data. So,In CNN, it is possible to correctly understand the shape data such as images

In CNN, sometimes the input and output data of convolution layer is called feature map, such as input feature map and output feature map. Subsequent “input / output data” and “characteristic diagram” will mean the same.

Some terms and knowledge in digital image processing will be involved in the follow-up. I have sorted them out in detail in my previous sharing,Digital image processing notes

The main difference between traditional digital image processing and image processing based on deep learning is that the filters used in traditional digital image processing are fixed and general, or have been determined according to empirical research; Based on deep learning, the most appropriate filter for the current scene can be “learned”. In addition, there is no difference between the two for image processing operation.

Simplify as shown in the figure below:

Introduction notes to Python deep learning 4 CNN

Convolution operation

The processing performed by the convolution layer isConvolution operation。 Convolution is equivalent to“Filter operation” in image processing“Filter” is also called “kernel”.

An example where the input data size is (4,4), the filter size is (3,3), and the final output size is (2,2):

Introduction notes to Python deep learning 4 CNN

For the input data, the convolution operation slides the window of the filter at certain intervals and applies it,Multiply the elements of the filter at each position by the corresponding elements of the input, and then sumThis calculation is sometimes called multiplication accumulation addition. then,Save the result to the corresponding location of the output。 This process is carried out at all positions, and the output of convolution operation can be obtained.

Introduction notes to Python deep learning 4 CNN

On CNN,Filter parametersIt corresponds to the previous oneweight。 And, CNN also existsbias。 The offset is usually only 1, and this value is added to all elements to which the filter is applied.

Introduction notes to Python deep learning 4 CNN

fill

It can be seen that after filtering, the image is “one circle less” than before, because the outermost element of the image does not have a complete surrounding element multiplied by the filter. We use padding so that the edge elements will not be lost.

Before the treatment of convolution, it is sometimes necessary toFill in fixed data (ratio) around the input data
Such as 0, etc.), which is called padding

Introduction notes to Python deep learning 4 CNN

Filling processing of convolution operation: fill 0 around the input data (dotted line in the figure indicates filling, 0 is not displayed)

stride

The position interval at which the filter is applied is called a stride。 If the stride is set to 2, as shown in the following figure,The interval of the window to which the filter is applied becomes 2 elements

Introduction notes to Python deep learning 4 CNN

To sum up, the output size will decrease when the stride is increased. When the fill is increased, the output size becomes larger. So how to calculate the output size through the two and the input size?

Assuming that the input size is (h, w), the filter size is (FH, FW), the output size is (oh, ow), the filling is p and the step is s, then the input size is:

Introduction notes to Python deep learning 4 CNN

Pay attention to the division in the formula. When the output size cannot be divided completely (when the result is decimal), countermeasures such as error reporting need to be taken.

Convolution operation of 3D data

Let’s take a look at the example of convolution operation on 3D data with channel direction.

Introduction notes to Python deep learning 4 CNN

Note: in the convolution operation of 3D data, the number of channels of the filter can only be set to the same value as the number of channels of the input data.

Here, we find another problem: three-dimensional data becomes two-dimensional data after being filtered by a filter.

Introduction notes to Python deep learning 4 CNN

How to solve it** Just use multiple filters (weights) * *.

Introduction notes to Python deep learning 4 CNN

By applying FN filters, the output characteristic diagram also generates FN filters. If the FN feature maps are collected together, a block with the shape of (FN, oh, ow) is obtained. Pass this box to the next layer, which is the processing flow of CNN.

Each channel has an offset, so the shape of the offset is (FN, 1, 1).

Introduction notes to Python deep learning 4 CNN

Batch processing

In the processing of neural network, the batch processing of packaging the input data is carried out. If we want the convolution operation to also correspond to batch processing, we need to save the data transferred between layers as4D data(a batch of 3D data). Specifically, data is saved in the order of (batch_num, channel, height, width).

Introduction notes to Python deep learning 4 CNN

Pool layer

Pooling is an operation to reduce the height and rectangle upward space。 In the field of image recognition, Max pooling is mainly used, that is, the maximum value in a target area is selected.

The following example is step 2 × Processing sequence during Max pooling of 2:

Introduction notes to Python deep learning 4 CNN

Characteristics of pool layer:

  1. There are no parameters to learn. Its work is very simple. It just takes the maximum value (or average value) from the target area, so there are no parameters to learn.
  2. The number of channels does not change, and each channel is pooled separately.
  3. It is robust to small position changes.

Implementation of convolution layer and pooling layer

We will implement these two layers in Python, but we have to solve some small problems before we start.

Expansion based on im2col

CNN processes 4-dimensional data, so the implementation of convolution operation looks very complex, but the problem will become very simple by using im2col.

If the convolution operation is implemented honestly, it is estimated that several layers of for statements will be repeated. In this wayImplementation is a little troublesomeMoreover, there are problems in numpy after using the for statementSlow processingDisadvantages of (in numpy, it is better not to use the for statement when accessing elements)

im2col(image to column)Is a function that will input dataopenTo fit the filter (weight).

Introduction notes to Python deep learning 4 CNN

In the above figure, the stride is set large for easy observation so that the application areas of the filter do not overlap. In the actual convolution operation, the application areas of the filter are almost overlapped.When the application areas of the filter overlap, the number of elements expanded by im2col will be more than that of the original block。 Therefore, the implementation using im2col has the disadvantage of consuming more memory than the ordinary implementation. However, it is beneficial for computer calculation to summarize it into a large matrix for calculation.

def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
    """

    Parameters
    ----------
    input_ Data: input data composed of 4-dimensional arrays (data volume, channel, height and length)
    filter_ H: filter high
    filter_ W: length of filter
    Stripe: stride
    Pad: fill

    Returns
    -------
    Col: 2-dimensional array
    """
    N, C, H, W = input_data.shape
    out_h = (H + 2*pad - filter_h)//stride + 1
    out_w = (W + 2*pad - filter_w)//stride + 1

    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    for y in range(filter_h):
        y_max = y + stride*out_h
        for x in range(filter_w):
            x_max = x + stride*out_w
            col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]

    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    return col

After expanding the input data with im2col, you only need to expand the filter (weight) of the convolution layer vertically into one column and calculate the product of the two matrices.

Introduction notes to Python deep learning 4 CNN

It can be seen that the matrix multiplication can be used for calculation after expansion.

Implementation of convolution layer

class Convolution:
	def __init__(self, W, b, stride=1, pad=0):
        self.W = W
        self.b = b
        self.stride = stride
        self.pad = pad

        #Intermediate data (used in backward)
        self.x = None
        self.col = None
        self.col_W = None

        #Gradient of weight and bias parameters
        self.dW = None
        self.db = None
        
    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        #Calculate output data size
        out_h = int(1 + (H + 2*self.pad - FH) / self.stride)
        out_w = int(1 + (W + 2*self.pad - FW) / self.stride)
        #Input data expansion
        col = im2col(x, FH, FW, self.stride, self.pad)
        #Expansion of filter
        col_W = self.W.reshape(FN, -1).T
        #Calculation using matrix multiplication
        out = np.dot(col, col_W) + self.b
        #Change back to 3D shape
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
        return out
    
	def backward(self, dout):
        FN, C, FH, FW = self.W.shape
        dout = dout.transpose(0, 2, 3, 1).reshape(-1, FN)

        self.db = np.sum(dout, axis=0)
        self.dW = np.dot(self.col.T, dout)
        self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)

        dcol = np.dot(dout, self.col_W.T)
        dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)

        return dx

Note here that by specifying – 1 in reshape, the reshape function will automatically calculate the number of elements in the – 1 dimension to make the number of elements of the multidimensional array consistent. Transfer will change the order of the axes of the multidimensional array. For example, transfer (0, 3, 1, 2) is to change the axes of the original 0, 1, 2, 3 positions to the positions of the input parameters.

Introduction notes to Python deep learning 4 CNN

The above is the implementation of forward processing of convolution layer. As for the code of back propagation of convolution layer, col2im is used, which is the inverse process of im2col. The code is as follows:

def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
    """

    Parameters
    ----------
    col :
    input_ Shape: the shape of the input data (for example: (10, 1, 28, 28))
    filter_h :
    filter_w
    stride
    pad

    Returns
    -------

    """
    N, C, H, W = input_shape
    out_h = (H + 2*pad - filter_h)//stride + 1
    out_w = (W + 2*pad - filter_w)//stride + 1
    col = col.reshape(N, out_h, out_w, C, filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)

    img = np.zeros((N, C, H + 2*pad + stride - 1, W + 2*pad + stride - 1))
    for y in range(filter_h):
        y_max = y + stride*out_h
        for x in range(filter_w):
            x_max = x + stride*out_w
            img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]

    return img[:, :, pad:H + pad, pad:W + pad]

Implementation of pooling layer

The pooled application area is expanded separately by channel.

Introduction notes to Python deep learning 4 CNN

class Pooling:
    def __init__(self, pool_h, pool_w, stride=1, pad=0):
        self.pool_h = pool_h
        self.pool_w = pool_w
        self.stride = stride
        self.pad = pad

        self.x = None
        self.arg_max = None

    def forward(self, x):
        N, C, H, W = x.shape
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)

        #Unfold
        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h * self.pool_w)

        #Maximum
        arg_max = np.argmax(col, axis=1)
        out = np.max(col, axis=1)
        
        #Conversion
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

        self.x = x
        self.arg_max = arg_max

        return out

    def backward(self, dout):
        dout = dout.transpose(0, 2, 3, 1)

        pool_size = self.pool_h * self.pool_w
        dmax = np.zeros((dout.size, pool_size))
        dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = dout.flatten()
        dmax = dmax.reshape(dout.shape + (pool_size,))

        dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
        dx = col2im(dcol, self.x.shape, self.pool_h, self.pool_w, self.stride, self.pad)

The maximum value can be calculated using NP of numpy Max method. np. Max can specify the axis parameter and find the maximum value in each axis direction specified by this parameter. For example, if it is written as NP Max (x, axis = 1), you can find the maximum value in each axis direction of the first dimension of input X.

By expanding the input data into a shape that is easy to pool, the later implementation will become very simple

Implementation of CNN

The composition of the network to be implemented is “revolution relu pooling affinity”-
Relu affine softmax “, we implement it as a class named simpleconvnet

class SimpleConvNet:
    """
    conv - relu - pool - affine - relu - affine - softmax
    Parameters
    ----------
    input_ Size: enter the size (784 in the case of MNIST)
    hidden_ size_ List: a list of the number of neurons in the hidden layer (e.g. [100, 100, 100])
    output_ Size: output size (10 in the case of MNIST)
    activation : 'relu' or 'sigmoid'
    weight_ init_ STD: standard deviation of specified weight (e.g. 0.01)
        Set "initial value of he" when 'relu' or 'he' is specified
        Set "initial value of Xavier" when 'sigmoid' or 'Xavier' is specified
    """
    def __init__(self, input_dim=(1, 28, 28), 
                 conv_param={'filter_num':30, 'filter_size':5, 'pad':0, 'stride':1},
                 hidden_size=100, output_size=10, weight_init_std=0.01):
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        input_size = input_dim[1]
        #Calculate the volume layer output data size
        conv_output_size = (input_size - filter_size + 2*filter_pad) / filter_stride + 1
        #Calculate the output data size of the pooling layer
        pool_output_size = int(filter_num * (conv_output_size/2) * (conv_output_size/2))

        #Initialize weight
        self.params = {}
        self.params['W1'] = weight_init_std * \
                            np.random.randn(filter_num, input_dim[0], filter_size, filter_size)
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * \
                            np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * \
                            np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)

        #Generation layer
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'],
                                           conv_param['stride'], conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])
        #The output layer is placed separately
        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)

        return x

    def loss(self, x, t):
        Find loss function
        Parameter x is the input data and t is the teacher label
        """
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    def accuracy(self, x, t, batch_size=100):
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        
        acc = 0.0
        
        for i in range(int(x.shape[0] / batch_size)):
            tx = x[i*batch_size:(i+1)*batch_size]
            tt = t[i*batch_size:(i+1)*batch_size]
            y = self.predict(tx)
            y = np.argmax(y, axis=1)
            acc += np.sum(y == tt) 
        
        return acc / x.shape[0]

    def numerical_gradient(self, x, t):
        "" "find gradient (numerical differentiation)"

        Parameters
        ----------
        X: input data
        T: teacher label

        Returns
        -------
        Dictionary variable with gradient of each layer
            grads['W1']、grads['W2']、... Is the weight of each layer
            grads['b1']、grads['b2']、... Is the offset of each layer
        """
        loss_w = lambda w: self.loss(x, t)

        grads = {}
        for idx in (1, 2, 3):
            grads['W' + str(idx)] = numerical_gradient(loss_w, self.params['W' + str(idx)])
            grads['b' + str(idx)] = numerical_gradient(loss_w, self.params['b' + str(idx)])

        return grads

    def gradient(self, x, t):
        "" "find gradient (error back propagation method)"

        Parameters
        ----------
        X: input data
        T: teacher label

        Returns
        -------
        Dictionary variable with gradient of each layer
            grads['W1']、grads['W2']、... Is the weight of each layer
            grads['b1']、grads['b2']、... Is the offset of each layer
        """
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        #Set
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Conv1'].dW, self.layers['Conv1'].db
        grads['W2'], grads['b2'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W3'], grads['b3'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads
        
    def save_params(self, file_name="params.pkl"):
        params = {}
        for key, val in self.params.items():
            params[key] = val
        with open(file_name, 'wb') as f:
            pickle.dump(params, f)

    def load_params(self, file_name="params.pkl"):
        with open(file_name, 'rb') as f:
            params = pickle.load(f)
        for key, val in params.items():
            self.params[key] = val

        for i, key in enumerate(['Conv1', 'Affine1', 'Affine2']):
            self.layers[key].W = self.params['W' + str(i+1)]
            self.layers[key].b = self.params['b' + str(i+1)]

In addition to the use of convolution layer and pooling layer in the network structure, it can be seen that the main flow of CNN implementation code is no different from the neural network previously implemented with full link layer.

CNN visualization

What is the convolution layer used in CNN “observing”?

The following figure shows the weight of the convolution layer of the first layer before and after learning. The elements of the weight are real numbers. However, on the display of the image, the minimum value is uniformly displayed as black (0) and the maximum value is displayed as white (255):

Introduction notes to Python deep learning 4 CNN

The filter before learning is initialized randomly, so there is no rule to follow in the intensity of black and white, but the filter after learning becomes a regular image. We found that,Through learning, the filter is updated into a regular filter

If you want to ask what the regular filter learned on the right side of the figure is “observing”, the answer is that it is observingedge(dividing line of color change) andplaque(local block area), etc.

Introduction notes to Python deep learning 4 CNN

In traditional digital image processing, the filter for edge detection is generally fixed, such as gradient operator, Gauss Laplace operator and so on. The Shenjiang network can learn the law of edges according to the training image and generate appropriate filters.

Information extraction based on hierarchical structure

According to the research related to the visualization of deep learning, with the deepening of the level, the extracted information (correctly speaking, neurons with strong reflection) is becoming more and more abstract.

Introduction notes to Python deep learning 4 CNN

The information extracted from the convolution layer of CNN. Layer 1 neuron pairEdge or plaqueYes, layer 3 is rightlines
reasonYes, layer 5 is rightObject partsYes, the last full connection layer pairCategory of object(dog or car) yes
Response.

If multiple convolution layers are stacked, the extracted information becomes more complex and abstract with the deepening of the layer, which is a very interesting place in deep learning. As the hierarchy deepens, neurons change from simple shape to “advanced” information. let me put it another way,Just as we understand the “meaning” of things, the object of response is gradually changing

Representative CNN

LeNet

Introduction notes to Python deep learning 4 CNN

Lenet is the earliest CNN. Compared with “CNN now”, lenet has several differences:

  1. The first difference is the activation function. The sigmoid function is used in lenet, but now CNN mainly uses the relu function.
  2. The original lenet uses subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in CNN today.

AlexNet

Alexnet is the trigger for the upsurge of deep learning, but its network structure is basically no different from lenet.

Introduction notes to Python deep learning 4 CNN

There are multiple convolution layers and pooling layers stacked on alexnet. Finally, the results are output through the full connection layer. In terms of structure, alexnet and lenet are not very different, but there are the following differences:

  • The activation function uses relu.
  • Use the LRN (local response normalization) layer for local normalization
    With the deepening of the level, neurons change from simple shape to “advanced” information. let me put it another way,Just as we understand the “meaning” of things, the object of response is gradually changing

Representative CNN

LeNet

[external chain picture transferring… (img-2dfxyuwg-1642312849109)]

Lenet is the earliest CNN. Compared with “CNN now”, lenet has several differences:

  1. The first difference is the activation function. The sigmoid function is used in lenet, but now CNN mainly uses the relu function.
  2. The original lenet uses subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in CNN today.

AlexNet

Alexnet is the trigger for the upsurge of deep learning, but its network structure is basically no different from lenet.

[external chain picture transferring… (img-ysh2m5dr-1642312849110)]

There are multiple convolution layers and pooling layers stacked on alexnet. Finally, the results are output through the full connection layer. In terms of structure, alexnet and lenet are not very different, but there are the following differences:

  • The activation function uses relu.
  • Use the LRN (local response normalization) layer for local normalization
  • Use dropout (randomly delete neurons during learning).

Recommended Today

HTML webpack plugin package error

Package errorTypeError: Cannot read property ‘tap’ of undefined at HtmlWebpackPlugin.apply (D:\MyData\xiamy9\study\webpack-study\node_modules\html-webpack-plugin\index.js:40:31) at WebpackCLI.webpack (D:\MyData\xiamy9\study\webpack-study\node_modules\webpack\lib\webpack.js:51:13) at WebpackCLI.createCompiler (D:\MyData\xiamy9\study\webpack-study\node_modules\webpack-cli\lib\webpack-cli.js:1845:29) at async WebpackCLI.buildCommand (D:\MyData\xiamy9\study\webpack-study\node_modules\webpack-cli\lib\webpack-cli.js:1952:20) at async Command.<anonymous> (D:\MyData\xiamy9\study\webpack-study\node_modules\webpack-cli\lib\webpack-cli.js:742:25) at async Promise.all (index 1) at async Command.<anonymous> (D:\MyData\xiamy9\study\webpack-study\node_modules\webpack-cli\lib\webpack-cli.js:1289:13) It is found that there is a version problem: webpack 4 and html-webpack-plugin5 are used together. It is necessary to upgrade […]