Deep learning in C (4): using keras.net to recognize coins

Time:2021-4-30

In this paper, we will study a convolutional neural network to solve the coin recognition problem, and we will implement a convolutional neural network in keras.net.

Here, we will introduce convolutional neural network (CNN) and propose a CNN architecture, which we will train to recognize coins.

What is CNN? As we mentioned in the previous article in this series, CNN is a kind of neural network (NN) which is often used in image classification tasks, such as object and face recognition. In CNN, not every node is connected to all nodes in the next layer. This partial connectivity helps to prevent over fitting in fully connected neural networks, and accelerates the convergence speed of neural networks.

The core concept of CNN is a mathematical operation called convolution, which is very common in the field of digital signal processing. Convolution is defined as the product of two functions, and the third function represents the overlap between the first two functions.

In object recognition, convolution allows us to detect different features in the image, such as vertical and horizontal edges, textures and curves. That’s why the first layer of any CNN is convolution.

Another common layer in CNN is pooling layer. Pooling is used to reduce the size of image representation, which means reducing the number of parameters and ultimately reducing the amount of computation. The most common type of pooling is maximum pooling, which gets the maximum value from the matched cell group at each location. Finally, a new image is created according to the maximum value.

Another concept related to convolution is padding. Filling ensures that the convolution process will occur uniformly throughout the image, including the boundary pixels. This guarantee is supported by a zero pixel border that is added around the reduced image (after pooling) so that all pixels of the image can be accessed the same number of times.

The most common CNN architecture usually starts from the convolution layer, then the activation layer, then the pooling layer, and finally the traditional fully connected network, such as multi-layer NN. This type of model is hierarchical and is called sequential model. Why end up with a fully connected network? In order to learn the nonlinear combination of features in the transformed image (after convolution and pooling).

Here is the architecture we will implement in CNN:

  • Conv2d layer – 32 filters, filter size 3
  • The activation layer uses the relu function
  • Conv2d layer – 32 filters, filter size 3
  • The activation layer uses the relu function
  • Maxpooling 2D layer – Application (2,2) pool window
  • Dropout layer, 25% – prevent over fitting by randomly deleting some values of the previous layer (set to 0); That’s the dilution method
  • Conv2d layer – 64 filters, filter size 3
  • The activation layer uses the relu function
  • Conv2d layer – 64 filters, filter size 3, step 3
  • The activation layer uses the relu function
  • Maxpooling 2D layer – Application (2,2) pool window
  • Dropout layer, 25%
  • Flatten layer – transform data for use in the next layer
  • Dense   Layer represents the full connection of a traditional neural network with 512 nodes.
  • The activation layer uses the relu function
  • Dropout layer, at 50%
  • Density layer, the number of classes that match the number of nodes
  • Softmax layer

The architecture follows a CNN architecture pattern for object recognition; The layer parameters are fine tuned by experiments.

The results of our parameter tuning process are stored in the settings class

public class Settings
{
        public const int ImgWidth = 64;
        public const int ImgHeight = 64;
        public const int MaxValue = 255;
        public const int MinValue = 0;
        public const int Channels = 3;
        public const int BatchSize = 12;
        public const int Epochs = 10;
        public const int FullyConnectedNodes = 512;
        public const string LossFunction = "categorical_crossentropy";
        public const string Accuracy = "accuracy";
        public const string ActivationFunction = "relu";
        public const string PaddingMode = "same";
        public static StringOrInstance Optimizer = new RMSprop(lr: Lr, decay: Decay);
        private const float Lr = 0.0001f;
        private const float Decay = 1e-6f;
}

We now have the architecture of CNN. Next, we will study CNN for coin recognition using keras. Net.

First, let’s download the keras. Net package from the nuget package manager. We can do it hereTools > Nuget package managerFind the nuget package manager in. Keras.net relies on the packages numpy.net and python_ netstandard。 If they are not installed, let’s continue to install them.
Deep learning in C (4): using keras.net to recognize coins
It should be noted that keras. Net   You need to install Python 2.7-3.7 on your operating system. It also needs to install Python libraries numpy and tensorflow. In this case, we are using 64 bit Python 3.7.

If you encounter any problems executing the code in this article, try running the following code once at the beginning of the main method execution in the console application. This code sets the required environment variables to find all the DLLs:

private static void SetupPyEnv()
{
     string envPythonHome = @"C:\Users\arnal\AppData\Local\Programs\Python\Python37\";
     string envPythonLib = envPythonHome + "Lib\;" + envPythonHome + @"Lib\site-packages\";
     Environment.SetEnvironmentVariable("PYTHONHOME", envPythonHome, EnvironmentVariableTarget.Process);
     Environment.SetEnvironmentVariable("PATH", envPythonHome + ";" + envPythonLib + ";" + Environment.GetEnvironmentVariable("PATH", EnvironmentVariableTarget.Machine), EnvironmentVariableTarget.Process);
     Environment.SetEnvironmentVariable("PYTHONPATH", envPythonLib, EnvironmentVariableTarget.User);
     PythonEngine.PythonHome = envPythonHome;
     PythonEngine.PythonPath = Environment.GetEnvironmentVariable("PYTHONPATH");
}

Now we’ll see how simple and transparent it is to create our coin recognition CNN using keras. Net. The following class shows the CNN class that contains all the logic of the model.

public class Cnn
    {
        private DataSet _dataset;
        private Sequential _model;

        public Cnn(DataSet dataset)
        {
            _dataset = dataset;
            _model = new Sequential();
        }

        public void Train()
        {
            // Build CNN model
            _model.Add(new Conv2D(32, kernel_size: (3, 3).ToTuple(),
                                 padding: Settings.PaddingMode,
                                 input_shape: new Shape(Settings.ImgWidth, Settings.ImgHeight, Settings.Channels)));
            _model.Add(new Activation(Settings.ActivationFunction));
            _model.Add(new Conv2D(32, (3, 3).ToTuple()));
            _model.Add(new Activation(Settings.ActivationFunction));
            _model.Add(new MaxPooling2D(pool_size: (2, 2).ToTuple()));
            _model.Add(new Dropout(0.25));

            _model.Add(new Conv2D(64, kernel_size: (3, 3).ToTuple(),
                                padding: Settings.PaddingMode));
            _model.Add(new Activation(Settings.ActivationFunction));
            _model.Add(new Conv2D(64, (3, 3).ToTuple()));
            _model.Add(new Activation(Settings.ActivationFunction));
            _model.Add(new MaxPooling2D(pool_size: (2, 2).ToTuple()));
            _model.Add(new Dropout(0.25));

            _model.Add(new Flatten());
            _model.Add(new Dense(Settings.FullyConnectedNodes));
            _model.Add(new Activation(Settings.ActivationFunction));
            _model.Add(new Dropout(0.5));
            _model.Add(new Dense(_dataset.NumberClasses));
            _model.Add(new Softmax());
            
            _model.Compile(loss: Settings.LossFunction,
              optimizer: Settings.Optimizer, 
              metrics: new string[] { Settings.Accuracy });
            
            _model.Fit(_dataset.TrainX, _dataset.TrainY,
                          batch_size: Settings.BatchSize,
                          epochs: Settings.Epochs,
                          validation_data: new NDarray[] { _dataset.ValidationX, _dataset.ValidationY });

            var score = _model.Evaluate(_dataset.ValidationX, _dataset.ValidationY, verbose: 0);
            Console.WriteLine("Test loss:" + score[0]);
            Console.WriteLine("Test accuracy:" + score[1]);
        }

        public NDarray Predict(string imgPath)
        {
            NDarray x = Utils.Normalize(imgPath);
            x = x.reshape(1, x.shape[0], x.shape[1], x.shape[2]);
            return _model.Predict(x);
        }
}

As we can see, we first have a constructor that receives the dataset (imported and processed in the second article in this series) and creates a new instance of the sequential class to store in a private variable_ Model. What is sequential? This is an empty model, which gives us the possibility of stacking layers, which is exactly what we need.

Then, in the train method, we first create our layer stack according to the architecture described in the previous article, then compile the model and call the fit method to start the training. The loss function used is categorical_ crossentropy。 What is the loss function? It’s a function that we use to optimize the learning process, that is, we either minimize it or maximize it. The optimizer is responsible for minimizing the loss function, an algorithm that minimizes the loss by changing the weight and learning rate of the network.

Finally, the validation data set is used to evaluate the model. Another method is predict, which, as the name suggests, predicts the label of the new input data. This method should be called after the training. Starting the training phase is as easy as running the following code:

var cnn = new Cnn(dataSet);
cnn.Train();

Let’s take a look at the results of the coin recognition training we are going through in this series
Deep learning in C (4): using keras.net to recognize coins
We can see that in the process of training, we can achieve 100% accuracy. In the prediction method, its output will be an ndarray, which contains the probability that the object or image belongs to a CNN class.

So, what kind of architecture needs GPU instead of CPU? For example, the alexnet architecture consists of five convolution layers and three fully connected layers, as well as pooling and activation layers. Because of its complexity, this type of deep CNN performs better on GPU. The general rule is that the more layers you add, the more complex the weight calculation will be.

After learning how to write our own CNN, we will enter the field of pre training model. The next article will explain this in detail!

Welcome to official account official account. If you love foreign technical articles, you can recommend it to me through public comments.
Deep learning in C (4): using keras.net to recognize coins