Deep analysis convolution neural network

Time:2019-11-21

Absrtact: deeply understand each component of convolution neural network, and build a neural network of its own.

The most advanced image recognition architecture uses many additional components to supplement the convolution operation. In this article, you will learn about some important components that can improve the speed and accuracy of modern convolutional neural networks.

Pooling

The first secret that makes CNN very efficient is pooling. Pooling is a vector used for scalar transformation of each local area of an image, like convolution. Different from convolution, they do not have filters or local area to calculate the dot product, but calculate the pixels in the average pooling, or just select the pixels with the highest intensity and discard the remaining pixels.

Max pooling has the best effect in recent years. It is based on the theory that the largest pixel of a region represents the most important characteristics of the region. Generally, the object image we want to classify may contain many other objects, for example, a cat appearing somewhere in the car image may mislead the classifier, and pooling can help mitigate this effect.

At the same time, it also greatly reduces the calculation cost. Generally, the image size of each layer in the network is directly proportional to the calculation cost (trigger) of each layer. Piecewise convolution is sometimes used as a replacement for pooling, which reduces the size of the image as the layer gets deeper, so it helps prevent the number of triggers needed by the network from surging.

Dropout

Over fitting refers to the phenomenon that the network runs well in the training set but does not perform well in the test set due to over dependence on some specific functions of the training set. Dropout is a technique against over fitting. It can randomly set some activation values to 0, forcing the network to explore more ways to classify images, rather than relying on some functions excessively. It is also one of the key elements in alesnet.

Batch Normalization

One of the main problems of neural network is the disappearance of gradient. Ioffe and Szegedy from Google brain found that this is mainly due to the change of data distribution caused by the change of internal covariates caused by the spread of information through the network. What they do is a technique called batch normalization, which works by standardizing each batch of images to have zero mean and unit differences.

It is usually placed before the nonlinear (relu) of CNNs. It greatly improves the accuracy and accelerates the training process.

Data enhancement

The human vision system is very good at adapting to image translation, rotation and other forms of distortion. Taking an image and flipping it, most people can still recognize it. However, covnets are not good at dealing with this distortion, and they may fail due to small rollovers. But with random twist image training, using horizontal flip, vertical flip, rotate, shift, and other twists, covnets will learn how to deal with this kind of twists.

Another common method is to subtract the average image from each image and divide it by the standard deviation.

Next, I’ll explain how to reason with keras to implement them.

In this article, all experiments will be conducted on a cifar10 dataset containing 60000 32 * 32GB images. It is divided into 50000 training images and 10000 test images. To make things more modular, we create a simple function for each layer:

def Unit(x,filters):
    out = BatchNormalization()(x)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
    return out

Here is the most important aspect of our code. The cell function defines a simple layer, which contains three levels. The first is batch normalization, which I explained earlier. Next, we add relu activation, and finally add convolution. Notice how I put relu before conv in the way of “pre activation”.

Now we will combine these unit layers into a model:

def MiniModel(input_shape):
    images = Input(input_shape)
    net = Unit(images,64)
    net = Unit(net,64)
    net = Unit(net,64)
    net = MaxPooling2D(pool_size=(2,2))(net)
    net = Unit(net,128)
    net = Unit(net,128)
    net = Unit(net,128)
    net = MaxPooling2D(pool_size=(2, 2))(net)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Dropout(0.5)(net)
    net = AveragePooling2D(pool_size=(8,8))(net)
    net = Flatten()(net)
    net = Dense(units=10,activation="softmax")(net)
    model = Model(inputs=images,outputs=net)
    return model

Here, we use the functional API to define our model. We start with three cells, each cell has 64 filters, followed by a max pooling layer, which will32 image reduced to 1616. Next, there are 3128 filter units, then pooling, where our image becomes 8 * 8. Finally, we have another three units, 256 channels. Please note that whenever we reduce the image size by 2 times, we double the number of channels.

Let’s add dropout of 0.5, which will randomly remove 50% of the parameters, and as I explained earlier, it can avoid over fitting.

Next, we need to load the cifar10 dataset and perform some data enhancements:

#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)

In the above code, after loading the training and test data, we subtract the average image from each image and divide it by the standard deviation, which is a basic data addition technology. For more advanced data enhancement, our image loading process will change slightly. Keras has a very useful data enhancement program, which can simplify the whole process.

The following code can do this:

datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=5. / 32,
                             height_shift_range=5. / 32,
                             horizontal_flip=True)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)

In the above, we first specify a rotation angle of 10 degrees, a 5 / 32 offset in height and width, and finally a horizontal flip, all of which will be randomly applied to the images in the training set. Of course, there are more transformations, and you can see all the parameters that you can specify for this class. Remember that overuse of data enhancements can be harmful.

Next, we have to convert the tag to one hot code:

#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)

In fact, almost everything else that makes up the training process is the same as in my previous tutorial, here is the complete code:

#import needed classes
import keras
from keras.datasets import cifar10
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,AveragePooling2D,Dropout,BatchNormalization,Activation
from keras.models import Model,Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.callbacks import ModelCheckpoint
from math import ceil
import os
from keras.preprocessing.image import ImageDataGenerator
def Unit(x,filters):
    out = BatchNormalization()(x)
    out = Activation("relu")(out)
    out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
    return out
#Define the model
def MiniModel(input_shape):
    images = Input(input_shape)
    net = Unit(images,64)
    net = Unit(net,64)
    net = Unit(net,64)
    net = MaxPooling2D(pool_size=(2,2))(net)
    net = Unit(net,128)
    net = Unit(net,128)
    net = Unit(net,128)
    net = MaxPooling2D(pool_size=(2, 2))(net)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Unit(net,256)
    net = Dropout(0.25)(net)
    net = AveragePooling2D(pool_size=(8,8))(net)
    net = Flatten()(net)
    net = Dense(units=10,activation="softmax")(net)
    model = Model(inputs=images,outputs=net)
    return model
#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)
datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=5. / 32,
                             height_shift_range=5. / 32,
                             horizontal_flip=True)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)
#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)
#define a common unit
input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
model.summary()
#Specify the training components
model.compile(optimizer=Adam(0.001),loss="categorical_crossentropy",metrics=["accuracy"])
epochs = 20
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],                epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")

First of all, there are some differences

input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
model.summary()

As mentioned earlier, the cifar 10 dataset consists of 32 * 32 RGB images, so the input shape has three channels.

The next line creates an instance of the model we defined and passes in the input shape. The last line prints out a complete summary of our network, including the number of parameters.

Finally, we need to explain:

epochs = 20
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],                  epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")

First we define the number of periods to run and the number of steps per period.

steps_per_epoch = ceil(50000/128)

50000 is the total number of training images, we use a batch size of 128.

Next is the fitting function, which is significantly different from the fitting function I explained in the previous tutorial.

The second look below will help:

Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
                    validation_data=[test_x,test_y],
                    epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)

This tutorial is mainly to introduce you to the basic components, you can also try to adjust the parameters and network to understand the accuracy you can improve.

This article is translated by alicloud yunqi community.

The original title of the article is components of competitive natural networks

By John olafenwa

Translator: ulaula, reviser: Yuan Hu.

Original address

This is the original content of yunqi community, which can not be reproduced without permission.