# Numpy constructs a deep neural network to identify whether there is a cat in the picture

Time：2019-12-2

Catalog

• 1 build data
• 2 random initialization data
• 3 forward propagation
• 4 Calculation of loss
• 5 back propagation
• 6 update parameters
• 7 build model
• 8 forecast
• 9 start training
• 10 forecast
• 11 show the predicted results in the form of pictures

Building a simple neural network to identify whether there is a cat in the picture
Code reference address: only using numpy to realize neural network

Build a simple and easy to understand neural network to help you understand the deep neural network
This code is implemented by numpy without regularization, batch and other algorithms

Let’s figure out the steps of neural network first

(1) build data. We want to build such a data, shape = (n, m), n represents the number of features, M represents the number of samples

(2) initialization parameters. Use random initialization parameters w and B

(3) forward propagation.

(4) calculate the loss.

(5) back propagation.

(6) update parameters.

(7) build the model.

(8) prediction. Prediction is actually a new forward propagation

With these steps in mind, we won’t be unable to build a neural network
Next, let’s build a deep neural network step by step according to the above steps

Catalog

## 1 build data

Let’s first look at what a dataset looks like

We get the data we need from the H5 file
Here, I have prepared two files, one is test uucatvnoncat.h5, the other is test uucatvnoncat.h5. The first file contains training set, and the second file contains test set

#Function to load data from a file
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
Train? X? Orig = NP. Array (train? Dataset ["train? Set? X"] [:])
Train? Y? Orig = NP. Array (train? Dataset ["train? Set? Y"] [:])

test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
Test? X? Orig = NP. Array (test? Dataset ["test? Set? X"] [:])
Test? Y? Orig = NP. Array (test? Dataset ["test? Set? Y"] [:])

Classes = NP. Array (test? Dataset ["list? Classes"] [:])? Categories, i.e. 1 and 0

#Now the data dimension is (m,). We need to change it into (1, m). M represents the number of samples
train_y_orig = train_y_orig.reshape((1, train_y_orig.shape[0]))
test_y_orig = test_y_orig.reshape((1, test_y_orig.shape[0]))

return train_x_orig, train_y_orig, test_x_orig, test_y_orig, classes

We can output these pictures

from random import randint
import matplotlib.pyplot as plt

train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

#Randomly select a picture from the training set
index = randint(0, 209)
img = train_x_orig[index]

#Show this picture
plt.imshow(img)

Print ('its label is: {} '. Format (train [0] [index]))

The demonstration results are as follows:

Conversion data

Because our data is standard picture data, we need to convert it into the format that we input
That is, (n, m) format, n represents the number of features, M represents the number of samples

train_x_orig, train_y, test_x_orig, test_y, classes = load_data()

M ﹐ train = train ﹐ x ﹐ orig. Shape [0] ﹐ number of training samples
M × test = test × orig.shape [0] × number of test samples
Num × PX = test × orig.shape [1] × width / height of each picture

#In order to facilitate the later matrix operation, we need to flatten and transpose the sample data
#The meaning of each dimension of the processed array is (picture data, sample number)
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T

#Next, we have a simple standardized processing of feature data (divide by 255, so that all values are in the range of [0, 1])
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.

The final output data is (12288, m) dimensional, and 12288 represents the feature, that is, 64 * 64 * 3 = 12288

## 2 random initialization data

Define the structure of neural network
Before initialization, we need to understand the structure of such a network that we are going to build. Here we take the following way to define the structure of the network

#Define the structure of neural network
'''
There are four layers, the first layer is the feature input of 12288, the second layer has 20 units, and so on
'''
nn_architecture = [
{'input_dim': 12288, 'output_dim': 20, 'activation': 'relu'},
{'input_dim': 20, 'output_dim': 7, 'activation': 'relu'},
{'input_dim': 7, 'output_dim': 5, 'activation': 'relu'},
{'input_dim': 5, 'output_dim': 1, 'activation': 'sigmoid'}
]

Initialization

#Initialize parameters w, B randomly according to the structure
def init_params(nn_architecture):

np.random.seed(1)

#Used to store generated parameters
params = {}

for id, layer in enumerate(nn_architecture):
# layer_id -> [1, 2, 3, 4]
layer_id = id + 1
params['W' + str(layer_id)] = np.random.randn(layer['output_dim'], layer['input_dim']) / np.sqrt(layer['input_dim'])
params['b' + str(layer_id)] = np.zeros((layer['output_dim'], 1))

return params

## 3 forward propagation

Let’s see what forward communication has done

Activation function

def sigmoid(Z):
'''
parameter
Z: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer

Return value
1 / (1 + NP. Exp (- z)): sigmoid calculation result shape = (output_dim, m)
'''
return 1/(1+np.exp(-Z))

def relu(Z):
'''
parameter
Z: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer

Return value
NP. Maximum (0, z): relu calculation result shape = (output_dim, m)
'''

return np.maximum(0,Z)

Building single layer forward propagation
That is to say, what we have done in one layer, we use this function to implement it

$$Z_curr = W_curr·A_prev + b_curr$$
$$A_curr = g(Z_curr)$$

#Single layer forward propagation
def layer_forward(W_curr, b_curr, A_prev, activation):
'''
Calculation
Z_curr = W_curr·A_prev + b_curr
A_curr = g(Z_curr)

parameter
W_curr: 当前层的 W parameter
b_curr: 当前层的 b parameter
A prev: a matrix of the previous layer
Activation: activation function to be used by the current layer

Return value
Z URU curr: Z of current layer
A URU curr: a of the current layer
'''

Z_curr = np.dot(W_curr, A_prev) + b_curr

#Judge activation function and find a
if activation == 'relu':
A_curr = relu(Z_curr)
elif activation == 'sigmoid':
A_curr = sigmoid(Z_curr)
else:
Raise exception ('unsupported activation function type! )

return Z_curr, A_curr

Building a complete forward propagation
In the complete forward propagation network, I encapsulate z-curr and a-prev into a dictionary and put them into the cache of the current layer, so that they can be used later when the gradient drops, and then form a list of caches of all layers, which we will use later, so here we will return two data, a and caches

#Full forward propagation
def full_forward(X, params, nn_architecture):
'''
parameter
X: input
params: W, b parameter存放的变量
NN architecture: structure

Caches storage format
Because back propagation also uses the a of the previous layer,
So here we store the a of the previous layer and the Z of the current layer into caches, which is convenient to call
caches = [
{a ﹐ prev ': a ﹐ prev,' Z ﹐ curr ': Z ﹐ curr}, ﹐ data stored in the first layer
{'A_prev': A_prev, 'Z_curr': Z_curr},
...
...
]

Return value
A_curr: a of the last layer, that is, Al (y_hat)
Caches: list of a of the previous layer and Z of the current layer
'''
caches = []

#X as layer 0 a
A_curr = X

for id, layer in enumerate(nn_architecture):
# layer_id -> [1, 2, 3, 4]
layer_id = id + 1

#Get a of the previous layer
A_prev = A_curr

#Get the W and B of the current layer from params
W_curr = params['W' + str(layer_id)]
b_curr = params['b' + str(layer_id)]
#Get activation function from layer
activation = layer['activation']

#Find Z and a of current layer
Z_curr, A_curr = layer_forward(W_curr, b_curr, A_prev, activation)

#Put a of the previous layer and Z of the current layer into memory
caches.append({
'A_prev': A_prev,
'Z_curr': Z_curr
})
return A_curr, caches

## 4 Calculation of loss

Formula for calculating loss
$$J(cost) = -\frac{1}{m} [Y·log(Y_hat).T + (1-Y)·log(1-Y_hat).T]$$

#Gain loss value
def get_cost(Y_hat, Y):
#Number of samples obtained
m = Y_hat.shape[1]

cost = -1 / m * (np.dot(Y, np.log(Y_hat).T) + np.dot(1 - Y, np.log(1 - Y_hat).T))

#Cost is a row by column data [[0.256654]], np.squeeze makes it a value
cost = np.squeeze(cost)

return cost

Here we can also define a function for accuracy

#We classify the predicted values. The predicted values are all decimals. For the two classification problems, we divide them into two categories
def convert_into_class(Y_hat):
#Copy matrix
prob = np.copy(Y_hat)
#Classify all > 0.5 in the matrix as 1
#< 0.5 classified as 0
prob[prob > 0.5] = 1
prob[prob <= 0.5] = 0

return prob

#Get accuracy
def get_accuracy(Y_hat, Y):
#Classification first, then precision
prob = convert_into_class(Y_hat)
#     accu = float(np.dot(Y, prob.T) + np.dot(1 - Y, 1 - prob.T)) / float(Y_hat.shape[1])
#The above method can also be used for accuracy
'''
Here our principle is to compare the predicted value with the real value, which is the same,
If the prediction is correct, add up their numbers and divide them by the total
The sample size, y'hat.shape [1] represents the total sample size
'''
accu = np.sum((prob == Y) / Y_hat.shape[1])
accu = np.squeeze(accu)

return accu

## 5 back propagation

It’s still the same. Let’s first look at the structure of back propagation

The main step is to find out the bias Da4 of A4 through J (cost), and then carry out single-layer back propagation. L represents the last layer, and l represents the current layer
$$dA^{L} = -(\frac{Y}{Y_hat} – \frac{1-Y}{1-Y_hat})$$

Derivation of activation function

'''
Calculate dZ
dZ = dA * g(Z) * (1 - g(Z))
'''
def relu_backward(dA, cache):
'''
Da: shape = (output × dim, m) × output × dim is the number of cells in the current layer
cache: shape = (output_dim, m)
'''
Z = cache
DZ = NP. Array (DA, copy = true) × copy matrix

# When z <= 0, dZ = 0
dZ[Z <= 0] = 0

return dZ

def sigmoid_backward(dA, cache):
'''
Da: shape = (output × dim, m) × output × dim refers to the number of cells in the current layer
cache: shape = (output_dim, m)
'''
Z = cache

s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)

assert (dZ.shape == Z.shape)

return dZ

Build single layer back propagation
In a single layer, we mainly calculate DZ, DW, DB

$$dZ^{[l]} = dA^{[l]}*g^{[l]’}(Z^{[l]})$$
$$dW^{[l]} = dZ^{[l]}·A^{[l-1]}.T$$
$$db^{[l]} = sum(dZ^{[l]})$$

#Single layer back propagation
def layer_backward(dA_curr, W_curr, Z_curr, A_prev, activation):
'''
Calculation
dZ = dA * g(Z) * (1 - g(Z))
dW = dZ·A.T / m
db = np.sum(dZ, axis=1, keepdims=True) / m

parameter
Da Lou curr: Da of current layer
W_curr: 当前层的 W parameter
Z_curr: 当前层的 Z parameter
A_prev: 上一层的 A parameter
Activation: activation function of the current layer

Return value
DW URU curr: DW of the current layer
DB curr: dB of current layer
Da_prev: Da of the previous layer
'''
M = a × prev. Shape [1] × number of samples
#Find out DZ URU curr
if activation == 'relu':
dZ_curr = relu_backward(dA_curr, Z_curr)
elif activation == 'sigmoid':
dZ_curr = sigmoid_backward(dA_curr, Z_curr)
else:
Raise exception ("unsupported activation function type! ""

#Calculate DZ, DW, DB respectively
dW_curr = np.dot(dZ_curr, A_prev.T) / m
db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m
dA_prev = np.dot(W_curr.T, dZ_curr)

return dW_curr, db_curr, dA_prev

Build complete back propagation
When building a complete back propagation, we must carefully check the dimensions of each matrix
Finally, the dictionary grads is returned

#Full back propagation
def full_backward(Y_hat, Y, params, caches, nn_architecture):
'''
parameter
Y_hat: predicted value (a value of the last layer)
Y: real y matrix
params: 存放每层 W, b parameter
Caches: store a, Z in forward propagation
NN architecture: structure

Return
'''
# 存放要进行梯度下降的 dW, db parameter，存放形式和 params 一样

#Calculate the Da of the last layer
dA_prev = - (np.divide(Y, Y_hat) - np.divide(1 - Y, 1 - Y_hat))

for id, layer in reversed(list(enumerate(nn_architecture))):
# layer_id -> [4, 3, 2, 1]
layer_id = id + 1

#The Da of the current layer is the last calculated Da ﹐ prev
dA_curr = dA_prev
# 从 params 中取出 当前层的 W parameter
W_curr = params['W' + str(layer_id)]
#Take out the data stored in our forward propagation from the caches memory
A_prev = caches[id]['A_prev']
Z_curr = caches[id]['Z_curr']
#Extract the activation function from the structure of the current layer
activation = layer['activation']

#Calculate the gradient value DW, DB of the current layer and Da of the previous layer
dW_curr, db_curr, dA_prev = layer_backward(dA_curr,
W_curr,
Z_curr,
A_prev,
activation)

return grads

## 6 update parameters

Update formula for parameter

$$W = W – \alpha * dW$$
$$b = b – \alpha * db$$

#Update parameters
'''
parameter
params: W,b parameter
Learning rate: learning rate when the gradient drops

Return
params: 更新后的parameter
'''
for id in range(len(params) // 2):
# layer_id -> [1, 2, 3, 4]
layer_id = id + 1
params['W' + str(layer_id)] -= learning_rate * grads['dW' + str(layer_id)]
params['b' + str(layer_id)] -= learning_rate * grads['db' + str(layer_id)]

return params

## 7 build model

Now that we have written the corresponding functions for each step, we just need to combine them

#Define model
def dnn_model(X, Y, nn_architecture, epochs=3000, learning_rate=0.0075):
'''
parameter
X: (n, m)
Y: (1, m)
NN architecture: network structure
Epochs: iterations
Learning rate

Return value
params: 训练好的parameter
'''
np.random.seed(1)
params = init_params(nn_architecture)
costs = []

for i in range(1, epochs + 1):
#Forward propagation
Y_hat, caches = full_forward(X, params, nn_architecture)

#Calculate loss
cost = get_cost(Y_hat, Y)

#Calculation accuracy
accu = get_accuracy(Y_hat, Y)

#Back propagation
grads = full_backward(Y_hat, Y, params, caches, nn_architecture)

# 更新parameter

if i % 100 == 0:
print ('Iter: {:05}, cost: {:.5f}, accu: {:.5f}'.format(i, cost, accu))
costs.append(cost)

#Draw the cost curve
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("DNN")
plt.show()

return params

## 8 forecast

#Predictive function
def predict(X, Y, params, nn_architecture):
Y_hat, _ = full_forward(X, params, nn_architecture)
accu = get_accuracy(Y_hat, Y)
Print ('prediction accuracy: {:. 2F} '. Format (accu))
return Y_hat

## 9 start training

#Start training
params = dnn_model(
train_x, train_y,
nn_architecture,
)

This is the result of training

Iter: 00100, cost: 0.67239, accu: 0.67943
Iter: 00200, cost: 0.64575, accu: 0.74641
Iter: 00300, cost: 0.62782, accu: 0.72727
Iter: 00400, cost: 0.59732, accu: 0.75598
Iter: 00500, cost: 0.52155, accu: 0.85646
Iter: 00600, cost: 0.48313, accu: 0.87560
Iter: 00700, cost: 0.43010, accu: 0.91866
Iter: 00800, cost: 0.36453, accu: 0.95694
Iter: 00900, cost: 0.34318, accu: 0.93780
Iter: 01000, cost: 0.29341, accu: 0.95215
Iter: 01100, cost: 0.25503, accu: 0.96172
Iter: 01200, cost: 0.22804, accu: 0.97608
Iter: 01300, cost: 0.19706, accu: 0.97608
Iter: 01400, cost: 0.18372, accu: 0.98086
Iter: 01500, cost: 0.16100, accu: 0.98086
Iter: 01600, cost: 0.14842, accu: 0.98086
Iter: 01700, cost: 0.13803, accu: 0.98086
Iter: 01800, cost: 0.12873, accu: 0.98086
Iter: 01900, cost: 0.12087, accu: 0.98086
Iter: 02000, cost: 0.11427, accu: 0.98086
Iter: 02100, cost: 0.10850, accu: 0.98086
Iter: 02200, cost: 0.10243, accu: 0.98086
Iter: 02300, cost: 0.09774, accu: 0.98086
Iter: 02400, cost: 0.09251, accu: 0.98086
Iter: 02500, cost: 0.08844, accu: 0.98565
Iter: 02600, cost: 0.08474, accu: 0.98565
Iter: 02700, cost: 0.08193, accu: 0.98565
Iter: 02800, cost: 0.07815, accu: 0.98565
Iter: 02900, cost: 0.07563, accu: 0.98565
Iter: 03000, cost: 0.07298, accu: 0.99043

## 10 forecast

#Forecast test set accuracy
Y_hat = predict(test_x, test_y, params, nn_architecture)

Prediction accuracy: 0.80

## 11 show the predicted results in the form of pictures

#Show pictures
#Because the test set has 50 images, we randomly generate 1-50 integer numbers
index = randint(1, 49)
#Because before, we expanded the data into a matrix of (12288, 50), and now let it return to a matrix of pictures
img = test_x[:, index].reshape((64, 64, 3))
#Show pictures
plt.imshow(img)
#Classify forecasts
Y_hat_ = convert_into_class(Y_hat)

#Convert 1, 0 to Chinese characters and output
PRED '=' yes' if int (y'hat '[0, index]) else' no '
True = yes' if int (test_y [0, index]) else 'no'
Print ('This picture '+ true' + 'cat')
Print ('forecast picture '+ PRED' + 'cat')

#Judge whether the prediction is correct
if int(Y_hat_[0, index]) == int(test_y[0, index]):
Print ('correct prediction! )
else:
Print ('forecast error! )

This picture is not a cat
Prediction picture is not a cat
Correct prediction!

We have built the whole instance code. If you want to train other data sets, you can change the corresponding network structure. For different data sets, there are different structures. For example, our X has two features, so the input “dim” of the first layer can be changed to 2

*The above code is recorded during learning. If there is any mistake in the text, please contact me to correct it in time

## Play with hybrid encryption

Data encryption is a technology with a long history, which means that plaintext is transformed into ciphertext through encryption algorithm and encryption key, while decryption is to recover ciphertext into plaintext through decryption algorithm and decryption key. Its core is cryptography. Data encryption is still the most reliable way for computer system to protect information. […]