Numpy constructs a deep neural network to identify whether there is a cat in the picture

Time:2019-12-2

Catalog

  • 1 build data
  • 2 random initialization data
  • 3 forward propagation
  • 4 Calculation of loss
  • 5 back propagation
  • 6 update parameters
  • 7 build model
  • 8 forecast
  • 9 start training
  • 10 forecast
  • 11 show the predicted results in the form of pictures

Building a simple neural network to identify whether there is a cat in the picture
Code reference address: only using numpy to realize neural network

Build a simple and easy to understand neural network to help you understand the deep neural network
Use simple cat recognition examples to help you further understand
This code is implemented by numpy without regularization, batch and other algorithms

Let’s figure out the steps of neural network first

(1) build data. We want to build such a data, shape = (n, m), n represents the number of features, M represents the number of samples

(2) initialization parameters. Use random initialization parameters w and B

(3) forward propagation.

(4) calculate the loss.

(5) back propagation.

(6) update parameters.

(7) build the model.

(8) prediction. Prediction is actually a new forward propagation

With these steps in mind, we won’t be unable to build a neural network
Next, let’s build a deep neural network step by step according to the above steps

Catalog

1 build data

Let’s first look at what a dataset looks like

We get the data we need from the H5 file
Here, I have prepared two files, one is test uucatvnoncat.h5, the other is test uucatvnoncat.h5. The first file contains training set, and the second file contains test set

#Function to load data from a file
def load_data():
    #Read files into memory
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    Train? X? Orig = NP. Array (train? Dataset ["train? Set? X"] [:])
    Train? Y? Orig = NP. Array (train? Dataset ["train? Set? Y"] [:])

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    Test? X? Orig = NP. Array (test? Dataset ["test? Set? X"] [:])
    Test? Y? Orig = NP. Array (test? Dataset ["test? Set? Y"] [:])

    Classes = NP. Array (test? Dataset ["list? Classes"] [:])? Categories, i.e. 1 and 0
    
    #Now the data dimension is (m,). We need to change it into (1, m). M represents the number of samples
    train_y_orig = train_y_orig.reshape((1, train_y_orig.shape[0]))
    test_y_orig = test_y_orig.reshape((1, test_y_orig.shape[0]))
    
    return train_x_orig, train_y_orig, test_x_orig, test_y_orig, classes

We can output these pictures

from random import randint
import matplotlib.pyplot as plt

#Load data
train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

#Randomly select a picture from the training set
index = randint(0, 209)
img = train_x_orig[index]

#Show this picture
plt.imshow(img)

Print ('its label is: {} '. Format (train [0] [index]))

The demonstration results are as follows:
Numpy constructs a deep neural network to identify whether there is a cat in the picture

Conversion data

Because our data is standard picture data, we need to convert it into the format that we input
That is, (n, m) format, n represents the number of features, M represents the number of samples

train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

M ﹐ train = train ﹐ x ﹐ orig. Shape [0] ﹐ number of training samples
M × test = test × orig.shape [0] × number of test samples
Num × PX = test × orig.shape [1] × width / height of each picture

#In order to facilitate the later matrix operation, we need to flatten and transpose the sample data
#The meaning of each dimension of the processed array is (picture data, sample number)
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T 

#Next, we have a simple standardized processing of feature data (divide by 255, so that all values are in the range of [0, 1])
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.

The final output data is (12288, m) dimensional, and 12288 represents the feature, that is, 64 * 64 * 3 = 12288

2 random initialization data

Define the structure of neural network
Before initialization, we need to understand the structure of such a network that we are going to build. Here we take the following way to define the structure of the network

Numpy constructs a deep neural network to identify whether there is a cat in the picture

#Define the structure of neural network
'''
There are four layers, the first layer is the feature input of 12288, the second layer has 20 units, and so on
'''
nn_architecture = [
    {'input_dim': 12288, 'output_dim': 20, 'activation': 'relu'},
    {'input_dim': 20, 'output_dim': 7, 'activation': 'relu'},
    {'input_dim': 7, 'output_dim': 5, 'activation': 'relu'},
    {'input_dim': 5, 'output_dim': 1, 'activation': 'sigmoid'}
]

Initialization

#Initialize parameters w, B randomly according to the structure
def init_params(nn_architecture):
    
    np.random.seed(1)
    
    #Used to store generated parameters
    params = {}

    for id, layer in enumerate(nn_architecture):
        # layer_id -> [1, 2, 3, 4]
        layer_id = id + 1
        params['W' + str(layer_id)] = np.random.randn(layer['output_dim'], layer['input_dim']) / np.sqrt(layer['input_dim'])
        params['b' + str(layer_id)] = np.zeros((layer['output_dim'], 1))
    
    return params

3 forward propagation

Let’s see what forward communication has done

Numpy constructs a deep neural network to identify whether there is a cat in the picture

Activation function

def sigmoid(Z):
    '''
    parameter
    Z: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer
    
    Return value
    1 / (1 + NP. Exp (- z)): sigmoid calculation result shape = (output_dim, m)
    '''
    return 1/(1+np.exp(-Z))

def relu(Z):
    '''
    parameter
    Z: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer
    
    Return value
    NP. Maximum (0, z): relu calculation result shape = (output_dim, m)
    '''

    return np.maximum(0,Z)

Building single layer forward propagation
That is to say, what we have done in one layer, we use this function to implement it

$$Z_curr = W_curr·A_prev + b_curr$$
$$A_curr = g(Z_curr)$$

#Single layer forward propagation
def layer_forward(W_curr, b_curr, A_prev, activation):
    '''
    Calculation
    Z_curr = W_curr·A_prev + b_curr
    A_curr = g(Z_curr)
    
    parameter
    W_curr: 当前层的 W parameter
    b_curr: 当前层的 b parameter
    A prev: a matrix of the previous layer
    Activation: activation function to be used by the current layer
    
    Return value
    Z URU curr: Z of current layer
    A URU curr: a of the current layer
    '''

    Z_curr = np.dot(W_curr, A_prev) + b_curr
    
    #Judge activation function and find a
    if activation == 'relu':
        A_curr = relu(Z_curr)
    elif activation == 'sigmoid':
        A_curr = sigmoid(Z_curr)
    else:
        Raise exception ('unsupported activation function type! )
    
    return Z_curr, A_curr

Building a complete forward propagation
In the complete forward propagation network, I encapsulate z-curr and a-prev into a dictionary and put them into the cache of the current layer, so that they can be used later when the gradient drops, and then form a list of caches of all layers, which we will use later, so here we will return two data, a and caches

#Full forward propagation
def full_forward(X, params, nn_architecture):
    '''
    parameter
    X: input
    params: W, b parameter存放的变量
    NN architecture: structure
    
    Caches storage format
    Because back propagation also uses the a of the previous layer,
    So here we store the a of the previous layer and the Z of the current layer into caches, which is convenient to call
    caches = [
    {a ﹐ prev ': a ﹐ prev,' Z ﹐ curr ': Z ﹐ curr}, ﹐ data stored in the first layer
    {'A_prev': A_prev, 'Z_curr': Z_curr}, 
    ...
    ...
    ]
    
    Return value
    A_curr: a of the last layer, that is, Al (y_hat)
    Caches: list of a of the previous layer and Z of the current layer
    '''
    caches = []
    
    #X as layer 0 a
    A_curr = X
    
    for id, layer in enumerate(nn_architecture):
        # layer_id -> [1, 2, 3, 4]
        layer_id = id + 1
        
        #Get a of the previous layer
        A_prev = A_curr
        
        #Get the W and B of the current layer from params
        W_curr = params['W' + str(layer_id)]
        b_curr = params['b' + str(layer_id)]
        #Get activation function from layer
        activation = layer['activation']
        
        #Find Z and a of current layer
        Z_curr, A_curr = layer_forward(W_curr, b_curr, A_prev, activation)
        
        #Put a of the previous layer and Z of the current layer into memory
        caches.append({
            'A_prev': A_prev,
            'Z_curr': Z_curr
        })
    return A_curr, caches

4 Calculation of loss

Formula for calculating loss
$$J(cost) = -\frac{1}{m} [Y·log(Y_hat).T + (1-Y)·log(1-Y_hat).T]$$

#Gain loss value
def get_cost(Y_hat, Y):
    #Number of samples obtained
    m = Y_hat.shape[1]
    
    cost = -1 / m * (np.dot(Y, np.log(Y_hat).T) + np.dot(1 - Y, np.log(1 - Y_hat).T))
    
    #Cost is a row by column data [[0.256654]], np.squeeze makes it a value
    cost = np.squeeze(cost)
    
    return cost

Here we can also define a function for accuracy

#We classify the predicted values. The predicted values are all decimals. For the two classification problems, we divide them into two categories
def convert_into_class(Y_hat):
    #Copy matrix
    prob = np.copy(Y_hat)
    #Classify all > 0.5 in the matrix as 1
    #< 0.5 classified as 0
    prob[prob > 0.5] = 1
    prob[prob <= 0.5] = 0
    
    return prob

#Get accuracy
def get_accuracy(Y_hat, Y):
    #Classification first, then precision
    prob = convert_into_class(Y_hat)
#     accu = float(np.dot(Y, prob.T) + np.dot(1 - Y, 1 - prob.T)) / float(Y_hat.shape[1])
    #The above method can also be used for accuracy
    '''
    Here our principle is to compare the predicted value with the real value, which is the same,
    If the prediction is correct, add up their numbers and divide them by the total
    The sample size, y'hat.shape [1] represents the total sample size
    '''
    accu = np.sum((prob == Y) / Y_hat.shape[1])
    accu = np.squeeze(accu)
    
    return accu

5 back propagation

It’s still the same. Let’s first look at the structure of back propagation

Numpy constructs a deep neural network to identify whether there is a cat in the picture

The main step is to find out the bias Da4 of A4 through J (cost), and then carry out single-layer back propagation. L represents the last layer, and l represents the current layer
$$dA^{L} = -(\frac{Y}{Y_hat} – \frac{1-Y}{1-Y_hat})$$

Derivation of activation function

'''
Calculate dZ
dZ = dA * g(Z) * (1 - g(Z))
'''
def relu_backward(dA, cache):  
    '''
    Da: shape = (output × dim, m) × output × dim is the number of cells in the current layer
    cache: shape = (output_dim, m)
    '''
    Z = cache
    DZ = NP. Array (DA, copy = true) × copy matrix
    
    # When z <= 0, dZ = 0
    dZ[Z <= 0] = 0
        
    return dZ

def sigmoid_backward(dA, cache):
    '''
    Da: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer
    cache: shape = (output_dim, m)
    '''
    Z = cache
    
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ

Build single layer back propagation
In a single layer, we mainly calculate DZ, DW, DB

$$dZ^{[l]} = dA^{[l]}*g^{[l]’}(Z^{[l]})$$
$$dW^{[l]} = dZ^{[l]}·A^{[l-1]}.T$$
$$db^{[l]} = sum(dZ^{[l]})$$

#Single layer back propagation
def layer_backward(dA_curr, W_curr, Z_curr, A_prev, activation):
    '''
    Calculation
    dZ = dA * g(Z) * (1 - g(Z))
    dW = dZ·A.T / m
    db = np.sum(dZ, axis=1, keepdims=True) / m
    
    parameter
    Da Lou curr: Da of current layer
    W_curr: 当前层的 W parameter
    Z_curr: 当前层的 Z parameter
    A_prev: 上一层的 A parameter
    Activation: activation function of the current layer
    
    Return value
    DW URU curr: DW of the current layer
    DB curr: dB of current layer
    Da_prev: Da of the previous layer
    '''
    M = a × prev. Shape [1] × number of samples
    #Find out DZ URU curr
    if activation == 'relu':
        dZ_curr = relu_backward(dA_curr, Z_curr)
    elif activation == 'sigmoid':
        dZ_curr = sigmoid_backward(dA_curr, Z_curr)
    else:
        Raise exception ("unsupported activation function type! ""
        
    #Calculate DZ, DW, DB respectively
    dW_curr = np.dot(dZ_curr, A_prev.T) / m
    db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m
    dA_prev = np.dot(W_curr.T, dZ_curr)
    
    return dW_curr, db_curr, dA_prev

Build complete back propagation
When building a complete back propagation, we must carefully check the dimensions of each matrix
Finally, the dictionary grads is returned

#Full back propagation
def full_backward(Y_hat, Y, params, caches, nn_architecture):
    '''
    parameter
    Y_hat: predicted value (a value of the last layer)
    Y: real y matrix
    params: 存放每层 W, b parameter
    Caches: store a, Z in forward propagation 
    NN architecture: structure
    
    Return
    Grads: gradient value
    '''
    # 存放要进行梯度下降的 dW, db parameter,存放形式和 params 一样
    grads = {}
    
    #Calculate the Da of the last layer
    dA_prev = - (np.divide(Y, Y_hat) - np.divide(1 - Y, 1 - Y_hat))
    
    for id, layer in reversed(list(enumerate(nn_architecture))):
        # layer_id -> [4, 3, 2, 1]
        layer_id = id + 1
        
        #The Da of the current layer is the last calculated Da ﹐ prev
        dA_curr = dA_prev
        # 从 params 中取出 当前层的 W parameter
        W_curr = params['W' + str(layer_id)]
        #Take out the data stored in our forward propagation from the caches memory
        A_prev = caches[id]['A_prev']
        Z_curr = caches[id]['Z_curr']
        #Extract the activation function from the structure of the current layer
        activation = layer['activation']
        
        #Calculate the gradient value DW, DB of the current layer and Da of the previous layer
        dW_curr, db_curr, dA_prev = layer_backward(dA_curr,
                                                   W_curr, 
                                                   Z_curr,
                                                   A_prev,
                                                   activation)
        #Put the gradient in grads
        grads['dW' + str(layer_id)] = dW_curr
        grads['db' + str(layer_id)] = db_curr
    
    return grads

6 update parameters

Update formula for parameter

$$W = W – \alpha * dW $$
$$b = b – \alpha * db $$

#Update parameters
def update_params(params, grads, learning_rate):
    '''
    parameter
    params: W,b parameter
    Grads: gradient value
    Learning rate: learning rate when the gradient drops
    
    Return
    params: 更新后的parameter
    '''
    for id in range(len(params) // 2):
        # layer_id -> [1, 2, 3, 4]
        layer_id = id + 1
        params['W' + str(layer_id)] -= learning_rate * grads['dW' + str(layer_id)]
        params['b' + str(layer_id)] -= learning_rate * grads['db' + str(layer_id)]
    
    return params

7 build model

Now that we have written the corresponding functions for each step, we just need to combine them

#Define model
def dnn_model(X, Y, nn_architecture, epochs=3000, learning_rate=0.0075):
    '''
    parameter
    X: (n, m)
    Y: (1, m)
    NN architecture: network structure
    Epochs: iterations
    Learning rate

    Return value
    params: 训练好的parameter
    '''
    np.random.seed(1)
    params = init_params(nn_architecture)
    costs = []
    
    for i in range(1, epochs + 1):
        #Forward propagation
        Y_hat, caches = full_forward(X, params, nn_architecture)
        
        #Calculate loss
        cost = get_cost(Y_hat, Y)
        
        #Calculation accuracy
        accu = get_accuracy(Y_hat, Y)
        
        #Back propagation
        grads = full_backward(Y_hat, Y, params, caches, nn_architecture)
        
        # 更新parameter
        params = update_params(params, grads, learning_rate)
        
        if i % 100 == 0:
            print ('Iter: {:05}, cost: {:.5f}, accu: {:.5f}'.format(i, cost, accu))
            costs.append(cost)
            
    #Draw the cost curve
    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per tens)')
    plt.title("DNN")
    plt.show()
    
    return params

8 forecast

#Predictive function
def predict(X, Y, params, nn_architecture):
    Y_hat, _ = full_forward(X, params, nn_architecture)
    accu = get_accuracy(Y_hat, Y)
    Print ('prediction accuracy: {:. 2F} '. Format (accu))
    return Y_hat

9 start training

#Start training
params = dnn_model(
    train_x, train_y,
    nn_architecture,
)

This is the result of training

Iter: 00100, cost: 0.67239, accu: 0.67943
Iter: 00200, cost: 0.64575, accu: 0.74641
Iter: 00300, cost: 0.62782, accu: 0.72727
Iter: 00400, cost: 0.59732, accu: 0.75598
Iter: 00500, cost: 0.52155, accu: 0.85646
Iter: 00600, cost: 0.48313, accu: 0.87560
Iter: 00700, cost: 0.43010, accu: 0.91866
Iter: 00800, cost: 0.36453, accu: 0.95694
Iter: 00900, cost: 0.34318, accu: 0.93780
Iter: 01000, cost: 0.29341, accu: 0.95215
Iter: 01100, cost: 0.25503, accu: 0.96172
Iter: 01200, cost: 0.22804, accu: 0.97608
Iter: 01300, cost: 0.19706, accu: 0.97608
Iter: 01400, cost: 0.18372, accu: 0.98086
Iter: 01500, cost: 0.16100, accu: 0.98086
Iter: 01600, cost: 0.14842, accu: 0.98086
Iter: 01700, cost: 0.13803, accu: 0.98086
Iter: 01800, cost: 0.12873, accu: 0.98086
Iter: 01900, cost: 0.12087, accu: 0.98086
Iter: 02000, cost: 0.11427, accu: 0.98086
Iter: 02100, cost: 0.10850, accu: 0.98086
Iter: 02200, cost: 0.10243, accu: 0.98086
Iter: 02300, cost: 0.09774, accu: 0.98086
Iter: 02400, cost: 0.09251, accu: 0.98086
Iter: 02500, cost: 0.08844, accu: 0.98565
Iter: 02600, cost: 0.08474, accu: 0.98565
Iter: 02700, cost: 0.08193, accu: 0.98565
Iter: 02800, cost: 0.07815, accu: 0.98565
Iter: 02900, cost: 0.07563, accu: 0.98565
Iter: 03000, cost: 0.07298, accu: 0.99043

Numpy constructs a deep neural network to identify whether there is a cat in the picture

10 forecast

#Forecast test set accuracy
Y_hat = predict(test_x, test_y, params, nn_architecture)

Prediction accuracy: 0.80

11 show the predicted results in the form of pictures

#Show pictures
#Because the test set has 50 images, we randomly generate 1-50 integer numbers
index = randint(1, 49)
#Because before, we expanded the data into a matrix of (12288, 50), and now let it return to a matrix of pictures
img = test_x[:, index].reshape((64, 64, 3))
#Show pictures
plt.imshow(img)
#Classify forecasts
Y_hat_ = convert_into_class(Y_hat)

#Convert 1, 0 to Chinese characters and output
PRED '=' yes' if int (y'hat '[0, index]) else' no '
True = yes' if int (test_y [0, index]) else 'no'
Print ('This picture '+ true' + 'cat')
Print ('forecast picture '+ PRED' + 'cat')

#Judge whether the prediction is correct
if int(Y_hat_[0, index]) == int(test_y[0, index]):
    Print ('correct prediction! )
else:
    Print ('forecast error! )

This picture is not a cat
Prediction picture is not a cat
Correct prediction!
Numpy constructs a deep neural network to identify whether there is a cat in the picture

We have built the whole instance code. If you want to train other data sets, you can change the corresponding network structure. For different data sets, there are different structures. For example, our X has two features, so the input “dim” of the first layer can be changed to 2

*The above code is recorded during learning. If there is any mistake in the text, please contact me to correct it in time