Neural networks are cute


Neural network is very cute!

0. Classification
The most important use of neural network is classification. In order to give you an intuitive understanding of classification, let’s take a look at a few examples
Spam identification: now there is an e-mail that extracts all the words that appear in it and sends them to a machine. The machine needs to judge whether the e-mail is spam.

Disease judgment: the patient went to the hospital to do a lot of liver and urine tests, and sent the test results into a machine. The machine needs to judge whether the patient is ill and what disease he is suffering from.

Cat and dog classification: there are a lot of photos of cats and dogs. Send each photo into a machine. The machine needs to judge whether the things in this photo are cats or dogs.

This kind of machine that can automatically classify the input is called a classifier.

The input of the classifier is a numerical vector called a feature (vector). In the first example, the input of the classifier is a pile of 0 and 1 values, which indicates whether each word in the dictionary appears in the e-mail. For example, the vector (1,1,0,0,…) indicates that only two words, Abdon and abnormal, appear in the e-mail; In the second example, the input of the classifier is a bunch of test indicators; In the third example, the input of the classifier is photos, if each photo is 320240 pixels of red, green and blue three channel color photos, then the input of the classifier is a length of 320240 * 3 = 230400.

The output of the classifier is also numerical. In the first example, output 1 indicates that the e-mail is spam, and output 0 indicates that the e-mail is normal; In the second example, output 0 means health, output 1 means hepatitis A, output 2 means hepatitis B, output 3 means biscuits, etc; In the third example, output 0 means the dog in the picture and output 1 means the cat.

The goal of classifier is to make the proportion of correct classification as high as possible. Generally, we need to collect some samples first, mark the correct classification results artificially, and then train the classifier with these marked data, and the trained classifier can work on the new feature vector.

1. Neurons
Let’s assume that the input of the classifier is two values obtained by some way, and the output is 0 and 1, for example, representing cat and dog respectively. Now there are some samples:

Neural networks are cute

Let’s think about it. What’s the easiest way to separate these two sets of eigenvectors? Of course, draw a vertical line between the two sets of data, the left side of the line is dog, the right side is cat, and the classifier is completed. Later came a new vector, which fell on the left side of the line are dogs, fall on the right side are cats.

A straight line divides the plane into two, a plane divides the three-dimensional space into two, and an N-1-dimensional hyperplane divides the n-dimensional space into two. The two sides belong to two different classes. This kind of classifier is called neuron.

As we all know, the linear equation on the plane is that the left side of the equation is greater than zero and less than zero respectively indicates whether the point is on one side or the other side of the line. This formula is extended to n-dimensional space. The high-dimensional form of the line is called hyperplane, and its equation is as follows:

Neuron is a model that outputs 1 when h is greater than 0 and 0 when h is less than 0. Its essence is to divide the feature space into two parts and think that the two lobes belong to two classes. I’m afraid you’ll never think of a simpler classifier. It was invented by McCulloch and Pitts in 1943.

This model is a bit like a neuron in the human brain: it receives electrical signals from multiple receptors, processes them (weighted addition and then offset a little, that is, to judge whether the input is on the side of a certain line), and sends out electrical signals (1 is sent out on the right side, otherwise no signal is sent out, which can be regarded as 0). This is why it is called a neuron.

Of course, in the picture above, we only know from the perspective of God that “a vertical line can separate two types”. When we actually train neurons, we don’t know how the features cluster. A learning method of neuron model is called Hebb algorithm

First, randomly select a line / plane / hyperplane, and then take the samples one by one. If the line is wrongly divided, it means that the point is wrongly divided. Move the line a little bit, let it close to the sample, strive to cross the sample, and let it run to the right side of the line; If the line is right, it stops for a while. Therefore, the process of training neurons is that the line is constantly dancing, and finally jumps to the vertical line position between the two classes.

2. Neural network
MP neurons have several significant drawbacks. First of all, it changes one side of the line to 0 and the other side to 1, which is not differentiable and is not conducive to mathematical analysis. The sigmoid function, which is similar to the 0-1 step function but smoother, is used to replace it (the sigmoid function has a scale parameter, which can control the response of neurons to points with different distances from the hyperplane, but it is ignored here). From then on, the gradient descent method can be used to construct the neural network training, which is the famous back propagation algorithm.

Another disadvantage of neurons is that they can only be cut once! Tell me how one knife can separate the following two types.

Neural networks are cute

The solution is multi-layer neural network, the output of the bottom neuron is the input of the high neuron. We can cut horizontally and vertically in the middle, and then combine the upper left and lower right parts and separate them from the upper right and lower left parts; You can also cut 10 knives around the edge of the upper left corner to dig out this part first, and then merge it with the lower right corner.

Each chop actually uses a neuron to do intersection, union and other operations on different half planes. That is to take the output of these neurons as input, and then connect one neuron. In this case, the shape of the feature is called XOR. In this case, one neuron can’t decide, but two layers of neurons can classify it correctly.

As long as you can cut enough cuts and put the results together, any strange shape of boundary neural network can be represented, so neural network can theoretically represent very complex function / spatial distribution. But whether the real neural network can swing to the right position depends on the initial value setting, sample size and distribution.

The magic of neural network is that each component of it is very simple – cutting space and some activation function (0-1 step, sigmoid, Max pooling), but it can be cascaded layer by layer. The input vector is connected to many neurons, and the output of these neurons is connected to a pile of neurons. This process can be repeated many times. This is very similar to the neurons in the human brain: each neuron has some neurons as its input, and it is also the input of other neurons. The numerical vector is like an electrical signal, which is transmitted between different neurons. Each neuron will send a signal to the next layer of neurons only when it meets certain conditions. Of course, human brain is much more complex than neural network model: artificial neural network generally does not have ring structure; The electrical signals of human brain neurons are not only strong or weak, but also have time priority. Just like Morse code, there is no such complex signal pattern in artificial neural network.

The training of neural network relies on back propagation algorithm: at the beginning, the input layer inputs the eigenvector, and the network calculates the output layer by layer. When the output layer finds that the output is different from the correct class number, it makes the last layer of neurons adjust the parameters. The last layer of neurons not only adjusts the parameters themselves, but also makes the penultimate layer of neurons connected to it adjust, Layers back and forth adjustment. After adjustment, the network will continue to test on the sample. If the output is still wrong, continue to adjust again and again until the network output is satisfactory. This is very similar to China’s literature and art system. The legendary Wu Meiniang drama group is a neuron in the network. Recently, the parameters have been adjusted.

3. Large scale neural network

We can’t help thinking, if our network has 10 layers of neurons, 8 layers and 2015 neurons, what does it mean? We know that it takes the output of a large number of neurons in the seventh layer as the input, and the neurons in the seventh layer take a large number of neurons in the sixth layer as the input. Does this special neuron in the eighth layer represent some abstract concept?

For example, your brain has a lot of neurons responsible for processing sound, visual and tactile signals. They will send different signals for different information. Will there be such a neuron (or small group of neurons) that collects these signals and analyzes whether it conforms to an abstract concept, Interact with other neurons responsible for more concrete and abstract concepts.

In 2012, krizhevsky of the University of Toronto constructed a super large convolutional neural network [1], which has 9 layers, 650000 neurons and 60 million parameters. The input of the network is pictures, and the output is 1000 classes, such as bugs, leopards, lifeboats and so on. The training of this model needs a large number of images, and its classification accuracy is the same as all previous classifiers. Zeiler and fergusi of New York University [2] picked out some neurons in the network and put together the input images with extremely large response to see what they have in common. They found that the neurons in the middle layer responded to some very abstract features.

The first layer of neurons is mainly responsible for recognizing color and simple texture

Neural networks are cute

Some neurons in the second layer can recognize finer textures, such as cloth, scale and leaf.

Neural networks are cute

Some neurons in the third layer are responsible for sensing the Yellow candlelight, egg yolk and high light in the dark.

Neural networks are cute

Some neurons in the fourth layer are responsible for recognizing the face of a cute dog, the ladybug and a bunch of round objects.

Neural networks are cute

Some neurons in the fifth layer can recognize flowers, domes, keyboards, birds and animals with dark circles.

Neural networks are cute

The concept here is not the output of the whole network, but the preference of the neurons in the middle layer of the network, which serve the neurons behind. Although every neuron is stupid (can only cut a knife), what 650000 neurons can learn is really profound.

[1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[2] Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901.

Author: Wang Xiaolong
Source: Zhihu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source.