Abstract: how does recurrent neural network work? How to build an Elman recurrent neural network? Here, we teach you how to create an Elman recurrent neural network for simple sequence prediction.

This paper takes the simplest RNNs model as an example: Elman recurrent neural network. It tells the working principle of the recurrent neural network. Even if you don’t have much basic knowledge of the recurrent neural network (RNNs), you can easily understand it. In order to make you understand RNNs better, we use Python tensor package and autograd library to build Elman cyclic neural network from scratch. The complete code in this article is implemented on GitHub.

**Elman cyclic neural network**

Jeff Elman first proposed Elman cyclic neural network and published it in the paper finding structure in time: it is only a three-layer feedforward neural network, and the input layer consists of an input neuron X1 and a group of context neuron units {C1… CN}. The neurons in the first time step of the hidden layer are the input of context neurons, and each neuron in the hidden layer has a context neuron. As the state of the previous time step is part of the input, we can say that Elman recurrent neural network has a certain memory – the context neuron represents a memory.

**Prediction sine wave**

Now, let’s train RNNs to learn sine functions. During the training, only one data is provided for the model at a time, which is why we only need one input neuron x1, and we hope to predict the value in the next step. The input sequence x consists of 20 data, and the target sequence is the same as the input sequence.

**Model implementation**

Import the package first.

Next, set the super parameters of the model. Set the size of the input layer to 7 (6 context neurons and 1 input neuron), and seq’length is used to define the length of the input and target sequence.

Generate training data: X is the input sequence, y is the target sequence.

Create two weight matrices. The matrix W1 with the size of (input_size, hidden_size) is used to hide the input of the connection, and the matrix W2 with the size of (hidden_size, output_size) is used to hide the output of the connection. The weight matrix is initialized by the normal distribution of zero mean.

The forward method is defined. Its parameters are input vector, context state vector and two weight matrices. The XH vector is created by connecting input and context state. In RNNs, tanh is better than sigmoid by performing dot product operation on XH vector and weight matrix W1, and then tanh function is used as nonlinear function. Then the point product operation is performed again for the new context [state] and weight matrix W2. We want to predict continuous values, so we don’t use any nonlinearity at this stage.

Note that the context [u state vector will populate the context neurons in the next step. That’s why we need to return the context [u state vector and out.

**train**

The structure of the training cycle is as follows:

1. Outer loop traverses each epoch. Epoch is defined as all training data pass through the training network once. At the beginning of each epoch, initialize the context [u state vector to 0.

2. Inner loop traverses each element in the sequence. Execute the forward method for forward passing, which returns PRED and context_state and will be used for the next time step. Then the mean square error (MSE) is calculated to predict the continuous value. Execute the backward () method to calculate the gradient, then update the weights W1 and W2. In each iteration, zero_ () method is used to clear the gradient, otherwise the gradient will accumulate. Finally, the context ﹣ state vector is wrapped into a new variable to separate it from the historical value.

The output produced during training shows how the loss of each epoch is reduced, which is a good measure. The gradual loss reduction means that our model is learning.

The prediction results are shown in the following figure: the yellow dot represents the predicted value and the blue dot represents the actual value, which are basically consistent, so the prediction effect of the model is very good.

**conclusion**

Here, we use Python to build a basic RNNs model from scratch, and learn how to apply RNNs to simple sequence prediction problems.

The above is the translation.

This article is translated by alicloud yunqi community.

The original title of the article is introduction to recurrent neural networks in Python

Please check the original text for details.