# A comprehensive design guide for CNN image classification

Time：2019-12-2

Absrtact: This paper is a comprehensive design guide about using CNN to complete image classification, which covers some experience of model design, model optimization and data processing. It is a comprehensive design guide suitable for image classification researchers.

When CNN is selected for image classification task, three main indexes need to be optimized: accuracy, simulation speed and memory consumption. These performance indicators are closely related to the design model. Different networks will weigh these performance indicators, such as VGg, perception and resnets. The common way is to fine tune these mature model frameworks, such as through adding and deleting some layers, using other layers of extension and some different network training techniques to complete the corresponding image classification tasks.
This paper is about the optimization design guide of image classification task using CNN, which is convenient for readers to quickly grasp the problems and experience in the design of image classification model. This paper focuses on the accuracy, speed and memory consumption of these three performance indicators for expansion, introduces different CNN classification methods, and discusses the performance of these methods in these three performance indicators. In addition, we can also see the performance of these mature CNN methods after various modifications. Finally, we will learn how to optimize the design of a CNN network model for specific image classification tasks.

## Network type

There is a very obvious trade-off between network type and performance index. First of all, we will definitely choose the model of perception or RESNET network type, because these two networks are newer than VGg and alexnet models, but we have made a trade-off between accuracy and simulation speed. If we want to have accuracy, choosing RESNET network is a good start. If we want to have fast simulation speed, we will choose perception network.

## Using intelligent convolution design to reduce running time and memory consumption

The latest development of CNN’s overall design has some amazing alternatives, which can speed up the running time of CNN simulation and reduce the memory consumption without losing too much precision. All of the following can be easily integrated into the above CNN mature model:

• Mobilenets: using the depth separable convolution technology, under the condition of sacrificing only 1% ~ 5% accuracy, the calculation amount and memory consumption are greatly reduced. The degree of accuracy reduction is proportional to the decrease of calculation amount and memory consumption.
• XNOR net: binary convolution is used, that is, there are only two values of convolution kernel: – 1 or 1. Through this design, the network has high sparsity, so it is easy to compress network parameters without taking up too much memory.
• Shufflenet: use point wise group convolution and channel rearrangement
At the same time, the accuracy of network model is better than that of mobilenets.
• Network
Pruning (network pruning): remove part of the structure of CNN model to reduce simulation running time and memory consumption, but also reduce accuracy. In order to maintain the accuracy, it is better to remove part of the structure without much impact on the final results.

## Network depth

For CNN, there are some common ways to increase the channel number and depth to increase the accuracy, but it will sacrifice the simulation running speed and memory. However, it should be noted that the effect of increasing the number of layers on accuracy improvement is decreasing, that is, the more layers are added, the smaller the effect of subsequent layers on accuracy improvement is, and even there will be over fitting phenomenon.

## Activation function

For neural network model, activation function is essential. The traditional activation functions, such as softmax, tanh and so on, are not suitable for CNN model. Some related researchers have proposed some new activation functions, such as the relu activation function proposed by Hinton. Using the relu activation function will usually get some good results, without the tedious parameter adjustment like using the ELU, prelu or leakyrelu functions. Once it is determined that the use of relu can achieve better results, the rest of the network can be optimized and the parameters adjusted to expect better accuracy.

## Convolution kernel size

It may be generally agreed that using larger convolution kernels (such as 5×5 and 7×7) will always produce the highest accuracy, however, it is not always the case. The researchers found that using a larger convolution kernel makes it difficult to separate the network, and the best use is a smaller kernel like the 3×3. RESNET and vggnet have proved this well. In addition, convolution kernels such as 1×1 can also be used to reduce the number of feature maps.

## Cavity convolution

Voided convolutions use the spacing between weights to be able to use pixels away from the center, which allows the network to increase the receptive field without increasing network parameters, that is, without increasing memory consumption. Related papers show that the use of void convolution can increase the accuracy of the network, but also increase the running time of simulation.

## Data expansion

Deep learning relies on big data, and using more data has been proved to further improve the performance of the model. With expanded processing, more data will be obtained free of charge. The expansion method depends on specific tasks. For example, if you are driving an autopilot task, there may not be inverted trees, cars and buildings. Therefore, vertical reversal of images is meaningless. However, when the weather changes and the whole scene changes, the image changes and horizontally flipped. It makes sense. There’s a great data expansion library.

## Training optimization

The best way is to follow a similar way to the activation function. First, use a simple training method to see if the designed model works well, and then use a more complex way to adjust and optimize. Personal recommendation starts from Adam. This method is very easy to use: you only need to set a low learning rate, which is usually set to 0.0001 by default, so that you can get very good results. After that, you can use SGD algorithm to fine tune.

## Class balance

In many cases, data imbalance may be encountered. What does data imbalance mean? Let’s take a simple example: suppose you’re training a network model to predict whether someone in the video has a lethal weapon. But there are only 50 videos with weapons in the training data, and 1000 videos without weapons. If this data set is used to complete the training, the model will surely tend to predict that there is no weapon in the video.
To solve this problem, we can do some things:

• Use weight in loss function: add higher weight to the loss function for the category with small amount of data, so that any incorrect classification for the specific category will result in a very high output error of the loss function.
• Oversampling: repetition of some training examples that contain representative categories can help improve model accuracy.
• Undersampling: sample the categories with large amount of data to reduce the degree of imbalance between them.
• Data expansion: expand the categories with small amount of data.

## Optimize transfer learning

For most data, the general approach is to use migration learning instead of training the network model from scratch. Migration learning is based on some mature models, using some of its network structure parameters, only training some new network components. The problems encountered in this process are: which model to choose for migration learning, which network layers to retain, and which network components to retrain, all depend on what your dataset looks like. If your data is more similar to a pre trained network (typically trained through an Imagenet dataset), the fewer network components you need to retrain, and vice versa. For example, if an image in the trial area contains grapes or not, the data set is composed of images containing grapes and images not containing grapes. These images are very similar to those in Imagenet, so only the last several layers of the selected model need to be retrained, perhaps only the last full connection layer. Because Imagenet is divided into 1000 categories, this task only distinguishes two categories - - the image package does not contain grapes, so you only need to change the last full connection layer parameter. We also assume that we are trying to classify whether the outer space image contains planets or not. This kind of data is very different from the Imagenet data set, so we need to retrain the convolution layer behind the model.

## conclusion

This paper is an optimization design guide for using CNN to complete image classification tasks, and gives some common optimization methods, which is convenient for beginners to adjust and optimize the network model according to the given rules.

George seif, machine learning engineer,