[image algorithm] – how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Time:2021-12-9

1. Preface

We know that convolution kernel (also known as filter matrix) plays a very important role in convolution neural network. To put it bluntly, CNN is mainly used to extract various feature maps of images

CNN mainly completes feature extraction through convolution operation. The image convolution operation mainly realizes the convolution operation by setting various feature extraction filter matrices (convolution kernel, which is usually set as a matrix with a size of 3×3 or 5×5), and then using the convolution kernel to ‘slide’ in the original image matrix (the image is actually a matrix composed of pixel values). If you don’t understand convolution, you can go to Wu Enda’s course. He has introduced it in detail. The focus of this paper is to use Python to realize convolution operation, so that you can see the actual convolution operation results, so as to have a more intuitive understanding of CNN feature extraction, and then better understand the deep learning algorithms such as image recognition and target detection based on convolution neural network.

[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here

2. Customize the convolution kernel, complete the image convolution operation with numpy, and generate the corresponding feature map:

"""   
@Project Name: CNN featuremap
@Author: LZP
@Time: 2021-02-02, 16:09
@Python Version: python3.6
@Coding Scheme: utf-8
@Interpreter Name: JupyterNotebook
"""
import numpy as np
import cv2
from matplotlib import pyplot as plt


def conv(image, kernel, mode='same'):
    If mode = = 'fill': # select whether to fill the edge
        h = kernel. Shape [0] // 2# convolution kernel columns are rounded by dividing by 2
        w = kernel. Shape [1] // 2# convolution kernel line is rounded by dividing by 2
#Fill the edge of the original image with constant filling and fill in the value of 0. Assuming that the original image is 600 * 600 and the convolution kernel size is 5 * 5, the filled image size is 604 * 604
        image = np.pad(image, ((h, h), (w, w), (0, 0)), 'constant')
 #Convolution operation
    conv_b = _convolve(image[:, :, 0], kernel)
    conv_g = _convolve(image[:, :, 1], kernel)
    conv_r = _convolve(image[:, :, 2], kernel)
    res = np.dstack([conv_b, conv_g, conv_r])
    return res


def _convolve(image, kernel):
    h_ kernel, w_ kernel = kernel. Shape # obtains the length and width of the convolution kernel, that is, the number of rows and columns
    h_ image, w_ image = image. Shape # gets the length and width of the picture to be processed
 #Calculate the point where the center point of the convolution kernel starts to move, because the edge of the picture cannot be empty
    res_h = h_image - h_kernel + 1
    res_w = w_image - w_kernel + 1
#Generate a 0 matrix to save the processed picture
    res = np.zeros((res_h, res_w), np.uint8)
    for i in range(res_h):
        for j in range(res_w):
#The image is passed in a matrix of the same size as the convolution kernel, which is taken from a part of the image to be processed
            #This matrix operates with the convolution kernel, and I and j are used for convolution kernel sliding
            res[i, j] = normal(image[i:i + h_kernel, j:j + w_kernel], kernel)

    return res

def normal(image, kernel):
#np. Multiply() function: multiply the corresponding positions of the array and matrix, and the output is consistent with the size of the multiplied array / matrix (point-to-point multiplication)
    res = np.multiply(image, kernel).sum()
    if res > 255:
        return 255
    elif res<0:
        return 0
    else:
        return res
if __name__ == '__main__':
    path = './ img/doramon. JPEG '# original image path
    image = cv2.imread(path)

    #Kernel is a 3x3 edge feature extractor, which can extract edges in all directions
    #Kernel2 is a 5x5 relief feature extractor.

    kernel1 = np.array([
        [1, 1, 1],
        [1, -7.5, 1],
        [1, 1, 1]
    ])
    kernel2 = np.array([[-1, -1, -1, -1, 0],
                        [-1, -1, -1, 0, 1],
                        [-1, -1, 0, 1, 1],
                        [-1, 0, 1, 1, 1],
                        [0, 1, 1, 1, 1]])
    res = conv(image, kernel1, 'fill')
    plt.imshow(res)
    plt.savefig('./out/filtered_picdoramon01.jpg', dpi=600)
    plt.show()

3. Experimental results

Edge feature extraction

[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here
[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here

[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here
[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here

Relief feature extractor

[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here
[image algorithm] - how does the convolution kernel in convolution neural network (CNN) extract image features (Python realizes image convolution operation)

Insert picture description here

4. Code interpretation:

1、image = cv2.imread(path)
         Imread is used to read pictures. The result is generally in the form of matrix
   2、image = np.pad(image, ((h, h), (w, w), (0, 0)), 'constant')
        Image refers to the array to be filled, ((h, H), (W, w), (0, 0)) is the length to be filled in each direction of each dimension, such as ((1, 2), (2, 2)), which means padding = 1 in the horizontal direction, padding = 2 in the vertical direction, padding = 2 in the horizontal direction and padding = 2 in the vertical direction in the first dimension. If you directly enter an integer, it means that the filled length of each dimension and direction is the same. Constant refers to the filling type. For example, chestnuts:

(1) Filling of one-dimensional array

import numpy as np
array = np.array([1, 1, 1])
 
#(1,2) indicates that 1 bit is filled in front of the one-dimensional array array and 2 bits are filled in the back
#  constant_ Values = (0,2) indicates that the front is filled with 0 and the back is filled with 2
ndarray=np.pad(array,(1,2),'constant', constant_values=(0,2)) 
 
print("array",array)
print("ndarray=",ndarray)

result:

array [1 1 1]
ndarray= [0 1 1 1 2 2]

(2) Filling of two-dimensional arrays

import numpy as np
array = np.array([[1, 1],[2,2]])
 
"""
((1,1), (2,2)) indicates that one row is filled in front of the first dimension of the two-dimensional array (here is the row) and one row is filled at the end;
                 Fill 2 columns in front of the second dimension (here is the column) of the two-dimensional array array and 2 columns at the end
constant_ Values = (0,3) indicates that the first dimension is filled with 0 and the second dimension is filled with 3
"""
ndarray=np.pad(array,((1,1),(2,2)),'constant', constant_values=(0,3)) 
 
print("array",array)
print("ndarray=",ndarray)

result:

array [[1 1]
       [2 2]]
 
ndarray= [[0 0 0 0 3 3]
          [0 0 1 1 3 3]     
          [0 0 2 2 3 3]
          [0 0 3 3 3 3]]