[artificial intelligence project] MNIST handwriting recognition experiment and analysis:

## 1. Brief description of experiment content

### 1.1 experimental environment

The software and hardware experimental environment used in this experiment is shown in the table:

Under the windows operating system, MNIST is trained and tested by using the deep learning framework of keras based on tensorflow.

Using the deep learning framework of keras, keras is a python library specially designed for simple neural network assembly. It has a large number of prepackaged network types, including two-dimensional and three-dimensional convolution networks, short-term and long-term networks and broader general networks. It is direct to use keras to build the network. The semantics used by keras in its API design is level oriented, and the network construction is relatively intuitive. Therefore, keras artificial intelligence framework is selected this time, which focuses on user friendliness, modularity and scalability.

### 1.2 introduction to MNIST dataset

MNIST (official website) is a very famous handwritten numeral recognition data set. It consists of handwritten digital pictures and corresponding labels, such as:

MNIST data set is divided into training image and test image. 60000 training images and 10000 test images. Each image represents a number in 0-9, and the image size is a matrix of 28 * 28.

- train-images-idx3-ubyte. GZ: training set images (9912422 bytes)
- train-labels-idx1-ubyte. GZ: training set labels (28881 bytes)
- t10k-images-idx3-ubyte. GZ: test set images (1648877 bytes)
- t10k-labels-idx1-ubyte. GZ: test set labels (4542 bytes)

### 1.3 data preprocessing

In the data preprocessing stage, the image is normalized. We reduce these values in the image to between 0 and 1, and then feed them to the neural network model. To do this, the data type of the image component is converted from an integer to a floating-point number, and then divided by 255. This makes it easier to train. The following is the function of preprocessing images: it is important to preprocess the training set and the test set in the same way:

After that, the tag is one hot coded: the value of discrete feature is extended to European space, and a value of discrete feature corresponds to a point in European space; In machine learning algorithms, the common calculation methods of distance between features or similarity are based on European space; Using one hot coding for discrete features will make the calculation of distance between features more reasonable

## 2. Experimental core code

### (1) MLP perceptron

# Build MLP model = Sequential() model.add(Dense(units=256, input_dim=784, kernel_initializer='normal', activation='relu')) model.add(Dense(units=128, kernel_initializer='normal', activation='relu')) model.add(Dense(units=64, kernel_initializer='normal', activation='relu')) model.add(Dense(units=10, kernel_initializer='normal', activation='softmax')) model.summary()

### (2) CNN convolutional neural network

# Build LeNet-5 model = Sequential() model.add(Conv2D(filters=6, kernel_size=(5, 5), padding='valid', input_shape=(28, 28, 1), activation='relu')) # C1 model.add(MaxPooling2D(pool_size=(2, 2))) # S2 model.add(Conv2D(filters=16, kernel_size=(5, 5), padding='valid', activation='relu')) # C3 model.add(MaxPooling2D(pool_size=(2, 2))) # S4 model.add(Flatten()) model.add(Dense(120, activation='tanh')) # C5 model.add(Dense(84, activation='tanh')) # F6 model.add(Dense(10, activation='softmax')) # output model.summary()

### Model interpretation

In the process of model training, we use the convolutional neural network structure of lenet-5.

First layer, convolution layer

The input of this layer is the original image pixels, and the input layer size accepted by the lenet-5 model is 28x28x1. The size of the filter of the first convolution layer is 5×5, the depth (convolution kernel type) is 6, full 0 filling is not used, and the step size is 1. Because full 0 filling is not used, the output size of this layer is 32-5 + 1 = 28 and the depth is 6. The number of convolution parameters in this layer is 5x5x16 + 6 = 156 parameters (trainable parameters), of which 6 are offset parameters. Because the node matrix of the next layer has 28x28x6 = 4704 nodes (number of neurons), and each node is connected with 5×5 = 25 nodes of the current layer, the convolution layer of this layer has a total of 28x28x6x (5×5 + 1) connections.

Second floor, pool layer

The input of this layer is the output of the first layer, which is a node matrix of 28x28x6 = 4704. The size of the filter used in this layer is 2×2, and the steps of length and width are both 2, so the size of the output matrix of this layer is 14x14x6. The filter used in the original lenet-5 model is slightly different from the filter to be used here, which is not introduced here.

Third layer, convolution layer

The input matrix size of this layer is 14x14x6, the filter size used is 5×5 and the depth is 16. This layer does not use full 0 filling, and the step is 1. The output matrix size of this layer is 10x10x16. According to the standard convolution layer, this layer should have 5x5x6x16 + 16 = 2416 parameters (trainable parameters), 10x10x16x (5×5 + 1) = 41600 connections.

The fourth layer, pool layer

The input matrix size of this layer is 10x10x16, the filter size adopted is 2×2, the step size is 2, and the output matrix size of this layer is 5x5x16.

Fifth floor, full connection floor

The input matrix size of this layer is 5x5x16. If the nodes in this matrix are pulled into a vector, it is the same as the input of the full connection layer. The number of output nodes in this layer is 120, with a total of 5x5x16x120 + 120 = 48120 parameters.

Sixth floor, full connection floor

There are 120 input nodes and 84 output nodes in this layer. The total parameters are 120×84 + 84 = 10164.

Seventh floor, full connection floor

The structure of the output layer of the last layer in the lenet-5 model is different from that of the full connection layer, but here we use the approximate representation of the full connection layer. There are 84 input nodes and 10 output nodes in this layer, with a total of 84×10 + 10 = 850 parameters.

### Model process

After the initial parameters are set, start the training. Each training needs to fine tune the parameters to get better training results. After many attempts, the final set parameters are:

- Optimizer: Adam optimizer
- Number of training rounds: 10
- Amount of data input each time: 500

The convolutional neural network of lenet-5 trains the MNIST data set, and uses the above model parameters for 10 rounds of training, and achieves 95% accuracy on the training set

## 3. Summary of result analysis machine

### 3.1 model test and result analysis

In order to verify the robustness of the model, the model with the best performance saved in the verification set under the above optimal parameters is finally tested on the test set, and the final accuracy is 95.13%

In order to better analyze our results, the confusion matrix is used to evaluate the performance of our model. Before model evaluation, learn some indicators.

TP (true positive): predict the positive class as the number of positive classes, the true value is 0, and the prediction is also 0fn (false negative): predict the positive class as the number of negative classes, the true value is 0, and the prediction is 1Fp (false positive): predict the negative class as the number of positive classes, the true value is 1, and the prediction is 0. TN (true negative): the negative class is predicted as the number of negative classes, the true is 1, and the prediction is also 1. The definition and meaning of confusion matrix:

Confusion matrix is a situation analysis table that summarizes the prediction results of classification model in machine learning. It summarizes the records in the data set in the form of matrix according to the two criteria of real category and category judgment predicted by classification model. For example, the predicted value of this matrix is represented by the real value of the following matrix:

### 3.2 comparison of results

Compared with the four layer full connection layer model, the model structure of the full connection layer is as follows:

The results are as follows:

In short, from the results, finally, after continuous parameter tuning, a model with classification accuracy of about 95% is trained, and experiments show that the model has strong robustness.

### 3.3 model prediction

Predict a single image:

## 4 Summary

By analyzing the research process of convolutional neural network, this paper proposes a complete set of convolutional neural network MNIST handwriting recognition process, and also improves the classification accuracy of the data set to 95%; Secondly, the model constructed in this paper is universal and can be applied to different data sets for feature extraction and classification with a little improvement. Thirdly, in the process of constructing the model, this paper comprehensively considers the computational resources and time cost. The constructed convolutional neural network model can be trained on an ordinary personal notebook. In addition, MLP perceptron is added as a comparison. From the results, it can be seen that the convolutional neural network has better effect. From the above points of view, the research of this paper has practical applicability and popularization, so it has high practical value!

This is the end of this article about the detailed explanation and trial practice of Python MNIST handwriting recognition. For more information about Python handwriting recognition, please search the previous articles of developeppaper or continue to browse the relevant articles below. I hope you will support developeppaper in the future!