In the previous note, we introduced the theoretical knowledge of perceptron and discussed its origin, working principle, solution strategy and convergence. In this note, we use the perceptron algorithm to solve practical problems.
First, we start with the simplest problem, and use perceptron algorithm to solve the classification of or logic.
import numpy as np
import matplotlib.pyplot as plt
x = [0,0,1,1]
y = [0,1,0,1]
plt.scatter(x[0],y[0], color="red",label="negative")
plt.scatter(x[1:],y[1:], color="green",label="positive")
plt.legend(loc="best")
plt.show()
Let’s define a function to determine whether a sample point is correctly classified. Because the sample points in this example are twodimensional, the weight vector is also twodimensional, which can be defined as\(w = (w_1, w_2)\), which can be expressed in Python using lists, such asw = [0, 0]
The distance from the sample to the hyperplane isw[0] * x[0] + w[1] * x[1] +b
。 The complete function is shown below.
def decide(data,label,w,b):
result = w[0] * data[0] + w[1] * data[1]  b
print("result = ",result)
if np.sign(result) * label <= 0:
w[0] += 1 * (label  result) * data[0]
w[1] += 1 * (label  result) * data[1]
b += 1 * (label  result)*(1)
return w,b
After writing the core function, we need to write a scheduling function, which provides the function of traversing each sample point.
def run(data, label):
w,b = [0,0],0
for epoch in range(10):
for item in zip(data, label):
dataset,labelset = item[0],item[1]
w,b = decide(dataset, labelset, w, b)
print("dataset = ",dataset, ",", "w = ",w,",","b = ",b)
print(w,b)
data = [(0,0),(0,1),(1,0),(1,1)]
label = [0,1,1,1]
run(data,label)
result = 0
dataset = (0, 0) , w = [0, 0] , b = 0
result = 0
dataset = (0, 1) , w = [0, 1] , b = 1
result = 1
dataset = (1, 0) , w = [0, 1] , b = 1
result = 2
dataset = (1, 1) , w = [0, 1] , b = 1
result = 1
dataset = (0, 0) , w = [0, 1] , b = 0
result = 1
dataset = (0, 1) , w = [0, 1] , b = 0
result = 0
dataset = (1, 0) , w = [1, 1] , b = 1
result = 3
dataset = (1, 1) , w = [1, 1] , b = 1
result = 1
dataset = (0, 0) , w = [1, 1] , b = 0
result = 1
dataset = (0, 1) , w = [1, 1] , b = 0
result = 1
In the later iteration, the parameters are stable and the algorithm has converged
Here is a data set from UCI: Pima diabetes data set. An example is from Chapter 3 of machine learning algorithm perspective
import os
import pylab as pl
import numpy as np
import pandas as pd
os.chdir(r"DataSets\pimaindiansdiabetesdatabase")
pima = np.loadtxt("pima.txt", delimiter=",", skiprows=1)
pima.shape
(768, 9)
indices0 = np.where(pima[:,8]==0)
indices1 = np.where(pima[:,8]==1)
pl.ion()
pl.plot(pima[indices0,0],pima[indices0,1],"go")
pl.plot(pima[indices1,0],pima[indices1,1],"rx")
pl.show()
Data preprocessing
1. Age discretization
pima[np.where(pima[:,7]<=30),7] = 1
pima[np.where((pima[:,7]>30) & (pima[:,7]<=40)),7] = 2
pima[np.where((pima[:,7]>40) & (pima[:,7]<=50)),7] = 3
pima[np.where((pima[:,7]>50) & (pima[:,7]<=60)),7] = 4
pima[np.where(pima[:,7]>60),7] = 5
2. Replace 8 times of pregnancy with 8 times of pregnancy
pima[np.where(pima[:,0]>8),0] = 8
3. Standardize data processing
pima[:,:8] = pima[:,:8]pima[:,:8].mean(axis=0)
pima[:,:8] = pima[:,:8]/pima[:,:8].var(axis=0)
4. Segmentation training set and test set
trainin = pima[::2,:8]
testin = pima[1::2,:8]
traintgt = pima[::2,8:9]
testtgt = pima[1::2,8:9]
Definition model
class Perceptron:
def __init__(self, inputs, targets):
#Set network size
#Record the dimension of input vector, and the dimension of neuron should be equal to it
if np.ndim(inputs) > 1:
self.nIn = np.shape(inputs)[1]
else:
self.nIn = 1
#Record the dimension of the target vector. The number of neurons should be equal to it
if np.ndim(targets) > 1:
self.nOut = np.shape(targets)[1]
else:
self.nOut = 1
#Record the number of samples of input vector
self.nData = np.shape(inputs)[0]
#Initialize the network. Add 1 here to include the offset term
self.weights = np.random.rand(self.nIn + 1, self.nOut) * 0.1  0.05
def train(self, inputs, targets, eta, epoch):
"" "training session" ""
#Synchronously with the offset term of the previous processing, add  1 to the input sample to match with W0
inputs = np.concatenate((inputs, np.ones((self.nData,1))),axis=1)
for n in range(epoch):
self.activations = self.forward(inputs)
self.weights = eta * np.dot(np.transpose(inputs), self.activations  targets)
return self.weights
def forward(self, inputs):
"" "neural network forward propagation link" ""
Calculation of occlusion
activations = np.dot(inputs, self.weights)
#Judge whether it is activated
return np.where(activations>0, 1, 0)
def confusion_matrix(self, inputs, targets):
Calculation of occlusion混淆矩阵
inputs = np.concatenate((inputs, np.ones((self.nData,1))),axis=1)
outputs = np.dot(inputs, self.weights)
nClasses = np.shape(targets)[1]
if nClasses == 1:
nClasses = 2
outputs = np.where(outputs<0, 1, 0)
else:
outputs = np.argmax(outputs, 1)
targets = np.argmax(targets, 1)
cm = np.zeros((nClasses, nClasses))
for i in range(nClasses):
for j in range(nClasses):
cm[i,j] = np.sum(np.where(outputs==i, 1,0) * np.where(targets==j, 1, 0))
print(cm)
print(np.trace(cm)/np.sum(cm))
print("Output after preprocessing of data")
p = Perceptron(trainin,traintgt)
p.train(trainin,traintgt,0.15,10000)
p.confusion_matrix(testin,testtgt)
Output after preprocessing of data
[[ 69. 86.]
[182. 47.]]
0.3020833333333333
In this case, the result of perceptron training is relatively poor. Here is just an example to show the algorithm.
Finally, an example of using perceptron algorithm to recognize MNIST handwritten digits is given. The code borrows from the kernel on kaggle.
Step 1: first import the required package and set the path of the data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
train = pd.read_csv(r"DataSets\Digit_Recognizer\train.csv", engine="python")
test = pd.read_csv(r"DataSets\Digit_Recognizer\test.csv", engine="python")
print("Training set has {0[0]} rows and {0[1]} columns".format(train.shape))
print("Test set has {0[0]} rows and {0[1]} columns".format(test.shape))
Training set has 42000 rows and 785 columns
Test set has 28000 rows and 784 columns
Step 2: Data Preprocessing

Establish
label
, its size is (42000, 1) 
Establish
training set
, size is (42000, 784) 
Establish
weights
Size is（10，784）
, which may be a little difficult to understand. We know that the weight vector describes neurons, 784 is the dimension, which means that an input sample has 784 dimensions, and the corresponding neurons docking with it also have 784 dimensions. At the same time, it should be remembered that a neuron can only output one output. In the number recognition problem, we expect to input a sample data, return 10 numbers, and then judge which number the sample is the most likely according to the probability. So, we need 10 neurons, which is(10,784)
The origin of it.
trainlabels = train.label
trainlabels.shape
(42000,)
traindata = np.asmatrix(train.loc[:,"pixel0":])
traindata.shape
(42000, 784)
weights = np.zeros((10,784))
weights.shape
(10, 784)
Here you can first look at a sample to find the feeling. Note that the original data is compressed into 784 dimensional array, we need to change it back to 28 * 28 image
#Take any line from the matrix
samplerow = traindata[123:124]
#28 * 28
samplerow = np.reshape(samplerow, (28,28))
plt.imshow(samplerow, cmap="hot")
Step 3: Training
Here we cycle the training data set several times, and then focus on the error rate curve
#Create a list to record the error rate of each round of training
errors = []
epoch = 20
for epoch in range(epoch):
err = 0
#For each sample (also for each row in the matrix)
for i, data in enumerate(traindata):
#Create a list to record the output value of each neuron
output = []
#Do point multiplication for each neuron and record the output value
for w in weights:
output.append(np.dot(data, w))
#Here, simply take the maximum output value as the most likely one
guess = np.argmax(output)
#The actual value is the corresponding item in the label list
actual = trainlabels[i]
#If the estimated value is different from the actual value, the classification is wrong and the weight vector needs to be updated
if guess != actual:
weights[guess] = weights[guess]  data
weights[actual] = weights[actual] + data
err += 1
#After calculating 42000 samples of iteration, error rate = number of errors / number of samples
errors.append(err/42000)
x = list(range(20))
plt.plot(x, errors)
[]
It can be seen from the figure that by the time of 15 iterations, the error rate has been on the rise, and the fitting has begun.
Perceptron is a very simple algorithm, so it is difficult to use it in real scene. The three examples here are all designed to write code to realize the algorithm and find out the feeling. Some experienced readers must be curious: why didn’t we use the scikit learn package? This part is actually the author’s other plan. I plan to write the source code interpretation notes of scikit learn with the algorithm. Of course, limited to personal level, it may not be able to analyze the essence, but do it diligently. The next paragraph will be written.MultiLayerPerceptronThe principle of the algorithm, where we can easily see that even if it is a simple perceptron, as long as a hidden layer is added, its classification ability can be greatly improved. In addition, I will spare time to write an article about the interpretation of sklearn source code. If you have any questions, please leave a message for discussion.