Machine learning notes (1) practical part of perceptron algorithm

Time:2020-2-26

In the previous note, we introduced the theoretical knowledge of perceptron and discussed its origin, working principle, solution strategy and convergence. In this note, we use the perceptron algorithm to solve practical problems.

First, we start with the simplest problem, and use perceptron algorithm to solve the classification of or logic.

import numpy as np
import matplotlib.pyplot as plt

x = [0,0,1,1]
y = [0,1,0,1]

plt.scatter(x[0],y[0], color="red",label="negative")
plt.scatter(x[1:],y[1:], color="green",label="positive")

plt.legend(loc="best")
plt.show()

Machine learning notes (1) practical part of perceptron algorithm

Let’s define a function to determine whether a sample point is correctly classified. Because the sample points in this example are two-dimensional, the weight vector is also two-dimensional, which can be defined as\(w = (w_1, w_2)\), which can be expressed in Python using lists, such asw = [0, 0]The distance from the sample to the hyperplane isw[0] * x[0] + w[1] * x[1] +b。 The complete function is shown below.

def decide(data,label,w,b):
    result = w[0] * data[0] + w[1] * data[1] - b
    print("result = ",result)
    if np.sign(result) * label <= 0:
        w[0] += 1 * (label - result) * data[0]
        w[1] += 1 * (label - result) * data[1]
        b += 1 * (label - result)*(-1)
    return w,b

After writing the core function, we need to write a scheduling function, which provides the function of traversing each sample point.

def run(data, label):
    w,b = [0,0],0
    for epoch in range(10):
        for item in zip(data, label):
            dataset,labelset = item[0],item[1]
            w,b = decide(dataset, labelset, w, b)
            print("dataset = ",dataset, ",", "w = ",w,",","b = ",b)
    print(w,b)
data = [(0,0),(0,1),(1,0),(1,1)]
label = [0,1,1,1]
run(data,label)
result =  0
dataset =  (0, 0) , w =  [0, 0] , b =  0
result =  0
dataset =  (0, 1) , w =  [0, 1] , b =  -1
result =  1
dataset =  (1, 0) , w =  [0, 1] , b =  -1
result =  2
dataset =  (1, 1) , w =  [0, 1] , b =  -1
result =  1
dataset =  (0, 0) , w =  [0, 1] , b =  0
result =  1
dataset =  (0, 1) , w =  [0, 1] , b =  0
result =  0
dataset =  (1, 0) , w =  [1, 1] , b =  -1
result =  3
dataset =  (1, 1) , w =  [1, 1] , b =  -1
result =  1
dataset =  (0, 0) , w =  [1, 1] , b =  0
result =  1
dataset =  (0, 1) , w =  [1, 1] , b =  0
result =  1
In the later iteration, the parameters are stable and the algorithm has converged

Here is a data set from UCI: Pima diabetes data set. An example is from Chapter 3 of machine learning algorithm perspective

import os
import pylab as pl
import numpy as np
import pandas as pd
os.chdir(r"DataSets\pima-indians-diabetes-database")
pima = np.loadtxt("pima.txt", delimiter=",", skiprows=1)
pima.shape
(768, 9)
indices0 = np.where(pima[:,8]==0)
indices1 = np.where(pima[:,8]==1)
pl.ion()
pl.plot(pima[indices0,0],pima[indices0,1],"go")
pl.plot(pima[indices1,0],pima[indices1,1],"rx")
pl.show()

Machine learning notes (1) practical part of perceptron algorithm

Data preprocessing

1. Age discretization

pima[np.where(pima[:,7]<=30),7] = 1
pima[np.where((pima[:,7]>30) & (pima[:,7]<=40)),7] = 2
pima[np.where((pima[:,7]>40) & (pima[:,7]<=50)),7] = 3
pima[np.where((pima[:,7]>50) & (pima[:,7]<=60)),7] = 4
pima[np.where(pima[:,7]>60),7] = 5

2. Replace 8 times of pregnancy with 8 times of pregnancy

pima[np.where(pima[:,0]>8),0] = 8

3. Standardize data processing

pima[:,:8] = pima[:,:8]-pima[:,:8].mean(axis=0)
pima[:,:8] = pima[:,:8]/pima[:,:8].var(axis=0)

4. Segmentation training set and test set

trainin = pima[::2,:8]
testin = pima[1::2,:8]
traintgt = pima[::2,8:9]
testtgt = pima[1::2,8:9]

Definition model

class Perceptron:
    def __init__(self, inputs, targets):
        #Set network size
        #Record the dimension of input vector, and the dimension of neuron should be equal to it
        if np.ndim(inputs) > 1:
            self.nIn = np.shape(inputs)[1]
        else:
            self.nIn = 1
        
        #Record the dimension of the target vector. The number of neurons should be equal to it
        if np.ndim(targets) > 1:
            self.nOut = np.shape(targets)[1]
        else:
            self.nOut = 1
        
        #Record the number of samples of input vector
        self.nData = np.shape(inputs)[0]
        
        #Initialize the network. Add 1 here to include the offset term
        self.weights = np.random.rand(self.nIn + 1, self.nOut) * 0.1 - 0.05
        
    def train(self, inputs, targets, eta, epoch):
        "" "training session" ""
        #Synchronously with the offset term of the previous processing, add - 1 to the input sample to match with W0
        inputs = np.concatenate((inputs, -np.ones((self.nData,1))),axis=1)
        
        for n in range(epoch):
            self.activations = self.forward(inputs)
            self.weights -= eta * np.dot(np.transpose(inputs), self.activations - targets)
        return self.weights
    
    def forward(self, inputs):
        "" "neural network forward propagation link" ""
        Calculation of occlusion
        activations = np.dot(inputs, self.weights)
        #Judge whether it is activated
        return np.where(activations>0, 1, 0)
    
    def confusion_matrix(self, inputs, targets):
        Calculation of occlusion混淆矩阵
        inputs = np.concatenate((inputs, -np.ones((self.nData,1))),axis=1)
        outputs = np.dot(inputs, self.weights)
        nClasses = np.shape(targets)[1]
        
        if nClasses == 1:
            nClasses = 2
            outputs = np.where(outputs<0, 1, 0)
        else:
            outputs = np.argmax(outputs, 1)
            targets = np.argmax(targets, 1)
            
        cm = np.zeros((nClasses, nClasses))
        for i in range(nClasses):
            for j in range(nClasses):
                cm[i,j] = np.sum(np.where(outputs==i, 1,0) * np.where(targets==j, 1, 0))
        print(cm)
        print(np.trace(cm)/np.sum(cm))
print("Output after preprocessing of data")
p = Perceptron(trainin,traintgt)
p.train(trainin,traintgt,0.15,10000)
p.confusion_matrix(testin,testtgt)
Output after preprocessing of data
[[ 69.  86.]
 [182.  47.]]
0.3020833333333333

In this case, the result of perceptron training is relatively poor. Here is just an example to show the algorithm.


Finally, an example of using perceptron algorithm to recognize MNIST handwritten digits is given. The code borrows from the kernel on kaggle.

Step 1: first import the required package and set the path of the data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
train = pd.read_csv(r"DataSets\Digit_Recognizer\train.csv", engine="python")
test = pd.read_csv(r"DataSets\Digit_Recognizer\test.csv", engine="python")
print("Training set has {0[0]} rows and {0[1]} columns".format(train.shape))
print("Test set has {0[0]} rows and {0[1]} columns".format(test.shape))
Training set has 42000 rows and 785 columns
Test set has 28000 rows and 784 columns

Step 2: Data Preprocessing

  1. Establishlabel, its size is (42000, 1)

  2. Establishtraining set, size is (42000, 784)

  3. EstablishweightsSize is(10,784), which may be a little difficult to understand. We know that the weight vector describes neurons, 784 is the dimension, which means that an input sample has 784 dimensions, and the corresponding neurons docking with it also have 784 dimensions. At the same time, it should be remembered that a neuron can only output one output. In the number recognition problem, we expect to input a sample data, return 10 numbers, and then judge which number the sample is the most likely according to the probability. So, we need 10 neurons, which is(10,784)The origin of it.

trainlabels = train.label
trainlabels.shape
(42000,)
traindata = np.asmatrix(train.loc[:,"pixel0":])
traindata.shape
(42000, 784)
weights = np.zeros((10,784))
weights.shape
(10, 784)

Here you can first look at a sample to find the feeling. Note that the original data is compressed into 784 dimensional array, we need to change it back to 28 * 28 image

#Take any line from the matrix
samplerow = traindata[123:124]
#28 * 28
samplerow = np.reshape(samplerow, (28,28))
plt.imshow(samplerow, cmap="hot")

Machine learning notes (1) practical part of perceptron algorithm

Step 3: Training

Here we cycle the training data set several times, and then focus on the error rate curve

#Create a list to record the error rate of each round of training
errors = []
epoch = 20

for epoch in range(epoch):
    err = 0
    #For each sample (also for each row in the matrix)
    for i, data in enumerate(traindata):
        #Create a list to record the output value of each neuron
        output = []
        #Do point multiplication for each neuron and record the output value
        for w in weights:
            output.append(np.dot(data, w))
        #Here, simply take the maximum output value as the most likely one
        guess = np.argmax(output)
        #The actual value is the corresponding item in the label list
        actual = trainlabels[i]
        
        #If the estimated value is different from the actual value, the classification is wrong and the weight vector needs to be updated
        if guess != actual:
            weights[guess] = weights[guess] - data
            weights[actual] = weights[actual] + data
            err += 1
    #After calculating 42000 samples of iteration, error rate = number of errors / number of samples
    errors.append(err/42000)
x = list(range(20))
plt.plot(x, errors)
[]

Machine learning notes (1) practical part of perceptron algorithm

It can be seen from the figure that by the time of 15 iterations, the error rate has been on the rise, and the fitting has begun.


Perceptron is a very simple algorithm, so it is difficult to use it in real scene. The three examples here are all designed to write code to realize the algorithm and find out the feeling. Some experienced readers must be curious: why didn’t we use the scikit learn package? This part is actually the author’s other plan. I plan to write the source code interpretation notes of scikit learn with the algorithm. Of course, limited to personal level, it may not be able to analyze the essence, but do it diligently. The next paragraph will be written.Multi-Layer-PerceptronThe principle of the algorithm, where we can easily see that even if it is a simple perceptron, as long as a hidden layer is added, its classification ability can be greatly improved. In addition, I will spare time to write an article about the interpretation of sklearn source code. If you have any questions, please leave a message for discussion.

drawing