BP algorithm code implementation of neural network

Time:2020-10-30

Practical combat of back propagation algorithm

This back propagation algorithm is based on the back propagation algorithm (BP) formula of neural network in the previous article. If you do not understand the back propagation algorithm formula, we strongly recommend that you refer to the previous article.

We will achieve a4Layer of full connection network, to complete the task of two categories. The number of network input nodes is2The number of nodes in the hidden layer is designed as follows:25、50and25, two nodes in the output layer, indicating that they belong to the category1Probability and category of2As shown in the figure below. It’s not used hereSoftmaxThe function constrains the sum of the network output probability values, but directly uses the mean square error function to calculate andOne-hotAll network activation functions are used to encode the error between real tagsSigmoidFunction, these designs are to be able to use our gradient propagation formula directly.

BP algorithm code implementation of neural network

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split

1. Prepare data

X, y = datasets.make_moons(n_samples=1000, noise=0.2, random_state=100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(X.shape, y.shape)  # (1000, 2) (1000,)
(1000, 2) (1000,)


def make_plot(X, y, plot_name):
    plt.figure(figsize=(12, 8))    
    plt.title(plot_name, fontsize=30)     
    plt.scatter(X[y==0, 0], X[y==0, 1])
    plt.scatter(X[y==1, 0], X[y==1, 1])
make_plot(X, y, "Classification Dataset Visualization ") 

BP algorithm code implementation of neural network

Layer network 2

  • By creating a new classLayerTo implement a network layer, the number of input nodes, output nodes, activation function type and other parameters are needed
  • Weight weightsAnd bias tensorbiasIt is generated and initialized automatically according to the number of input and output nodes
class Layer:
    #Full link network layer
    def __init__(self, n_input, n_output, activation=None, weights=None, bias=None):
        """
        :param int n_ Input: the number of input nodes 
        :param int n_ Output: number of output nodes         
        : param STR activation: activation function type         
        : param weights: weight tensor, generated internally by default class         
        : param bias: offset, generated internally by default class 
        """
        self.weights = weights if weights is not None else np.random.randn(n_input, n_output) * np.sqrt(1 / n_output) 
        self.bias = bias if bias is not None else np.random.rand(n_output) * 0.1
        self.activation  =Activation ා activation function type, such as' sigmoid '         
        self.activation_ Output = none ා the output value o of the activation function         
        self.error  =None # the intermediate variable used to calculate the delta variable of the current layer 
        self.delta  =None ා records the delta variable of the current layer, which is used to calculate the gradient 
    
    def activate(self, X):
        #Forward calculation function
        r = np.dot(X, self.weights) + self.bias # [email protected] + b
        #Through the activation function, the output o (activation) of the full connection layer is obtained_ output)      
        self.activation_output = self._apply_activation(r) 
        return self.activation_output
    
    def _ apply_ Activation (self, R): calculates the output of the activation function
        if self.activation is None:
            Return R ා no activation function, return directly
        elif self.activation == 'relu':
            return np.maximum(r, 0)
        elif self.activation == 'tanh':
            return np.tanh(r)
        elif self.activation == 'sigmoid':
            return 1 / (1 + np.exp(-r))
        
        return r
    
    def apply_activation_derivative(self, r):
        #Calculate the derivative of the activation function
        #There is no activation function and the derivative is 1
        if self.activation is None:
            return np.ones_like(r)
        #Derivative of relu function
        elif self.activation == 'relu':             
            grad = np.array(r, copy=True)             
            grad[r > 0] = 1.             
            grad[r <= 0] = 0.             
            return grad
        #Derivative realization of tanh function         
        elif self.activation == 'tanh':             
            return 1 - r ** 2 
        #Derivative realization of sigmoid function         
        elif self.activation == 'sigmoid': 
            return r * (1 - r)
        return r

3. Network model

  • After creating a single layer network class, we implement theNeuralNetworkclass
  • It maintains the network layer of each layer internallyLayerClass object, which can be accessed throughadd_layerFunction appends the network layer,
  • To achieve the purpose of creating different network models.
y_test.flatten().shape # (300,)
(300,)



class NeuralNetwork:
    def __init__(self):
        self._ Layers = [] ා list of network layer objects
    
    def add_layer(self, layer):
        self._layers.append(layer)
    
    def feed_forward(self, X):
        #Forward propagation (derivation)
        for layer in self._layers:
            X = layer.activate(X)
        return X
    
    def backpropagation(self, X, y, learning_rate):
        #Implementation of back propagation algorithm
        #Calculate forward to get the final output value
        output = self.feed_forward(X)
        for i in reversed(range(len(self._ Layers)): reverse loop
            layer = self._layers[i]
            if layer == self._ Layers [- 1]: ා if it is an output layer
                layer.error = y - output
                #Calculate the delta of the last layer and refer to the gradient formula of the output layer
                layer.delta = layer.error * layer.apply_activation_derivative(output)
            Else: if it is a hidden layer
                next_layer = self._layers[i + 1]
                layer.error = np.dot(next_layer.weights, next_layer.delta)
                layer.delta = layer.error*layer.apply_activation_derivative(layer.activation_output)
        
        #Loop update weights
        for i in range(len(self._layers)):
            layer = self._layers[i]
            # o_ I is the output of the previous network layer
            o_i = np.atleast_2d(X if i == 0 else self._layers[i - 1].activation_output)
            #In gradient descent algorithm, delta is a negative number in the formula, so the plus sign is used here 
            layer.weights += layer.delta * o_i.T * learning_rate 
    
    def train(self, X_train, X_test, y_train, y_test, learning_rate, max_epochs):
        #Network training function
        #One hot coding
        y_onehot = np.zeros((y_train.shape[0], 2)) 
        y_onehot[np.arange(y_train.shape[0]), y_train] = 1
        mses = [] 
        for i in range(max_ Epochs: train 100 epochs             
            for j in range(len(X_ Train): train one sample at a time                 
                self.backpropagation(X_train[j], y_onehot[j], learning_rate)             
                if i % 10 == 0: 
                    #Print out MSE loss                 
                    mse = np.mean(np.square(y_onehot - self.feed_forward(X_train)))                 
                    mses.append(mse)                 
                    print('Epoch: #%s, MSE: %f, Accuracy: %.2f%%' % 
                          (i, float(mse), self.accuracy(self.predict(X_test), y_test.flatten()) * 100)) 

        return mses
    
    def accuracy(self, y_ predict, y_ Test: calculation accuracy
        return np.sum(y_predict == y_test) / len(y_test)
    
    def predict(self, X_predict):
        Y_ predict =  self.feed_ forward(X_ Predict) ා y at this time_ The predict shape is [600 * 2], and the second dimension represents the probability of two outputs
        y_predict = np.argmax(y_predict, axis=1)
        return y_predict

4. Network training

NN = neuralnetwork() ා instantiate network class 
nn.add_ Layer (layer (2, 25, 'sigmoid'))? Hidden layer 1, 2 = > 25 
nn.add_ Layer (layer (25, 50, 'sigmoid'))? Hidden layer 2, 25 = > 50 
nn.add_ Layer (layer (50, 25, 'sigmoid'))? Hidden layer 3, 50 = > 25 
nn.add_ Layer (layer (25, 2, 'sigmoid')) (output layer, 25 = > 2
# nn.train(X_train, X_test, y_train, y_test, learning_rate=0.01, max_epochs=50)
def plot_decision_boundary(model, axis):
    
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1] - axis[0])*100)).reshape(1, -1),
        np.linspace(axis[2], axis[3], int((axis[3] - axis[2])*100)).reshape(-1, 1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]
    
    y_predic = model.predict(X_new)
    zz = y_predic.reshape(x0.shape)
    
    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A', '#FFF590', '#90CAF9'])
    
    plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
plt.figure(figsize=(12, 8))    
plot_decision_boundary(nn, [-2, 2.5, -1, 2])
plt.scatter(X[y==0, 0], X[y==0, 1])
plt.scatter(X[y==1, 0], X[y==1, 1])
<matplotlib.collections.PathCollection at 0x29018d6dfd0>

BP algorithm code implementation of neural network

y_predict = nn.predict(X_test)
y_predict[:10] # array([1, 1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=int64)
array([1, 1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=int64)



y_test[:10] # array([1, 1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=int64)
array([1, 1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=int64)



nn.accuracy(y_predict, y_test.flatten()) # 0.86
0.86