Summary:This article will share how the recurrent neural network LSTM RNN can realize regression prediction.
This article shares from the Huawei cloud community [[Python artificial intelligence] XIV. Sin curve prediction of recurrent neural network LSTM RNN regression case [changeable AI show]]（https://bbs.huaweicloud.com/b…）》, by eastmount.
I RNN and LSTM review
(1) RNN principle
The English name of recurrent neural networks is recurrent neural networks, or RNN for short. Suppose there is a set of data Data0, data1, data2 and data3. Use the same neural network to predict them and get the corresponding results. If there is a relationship between the data, such as the steps before and after cooking and cutting, and the order of English words, how can the relationship between the data be learned by neural network? This requires RNN.
If there is an ABCD number, you need to predict the next number e, which will be predicted according to the previous ABCD order, which is called memory. Before prediction, it is necessary to review the previous memory, plus the new memory points in this step, and finally output. This principle is used by recurrent neural network (RNN).
First, let’s think about how humans analyze the relationship or order between things. Humans usually remember what happened before, so as to help us judge our subsequent behavior. Can computers also remember what happened before?
When analyzing Data0, we store the analysis results in memory memory, and then when analyzing data1, neural network (NN) will generate new memory, but at this time, the new memory is not related to the old memory, as shown in the above figure. In RNN, we will simply call the old memory to analyze the new memory. If we continue to analyze more data, NN will accumulate all the previous memories.
The RNN structure is shown in the figure below. According to the time points T-1, t and T + 1, there are different x at each time. Each calculation will consider the state of the previous step and X (T) of this step, and then output the y value. In this mathematical form, s (T) will be generated after each RNN operation. When RNN wants to analyze x (T + 1), y (T + 1) at the moment is jointly created by s (T) and S (T + 1), and S (T) can be regarded as the memory of the previous step. The accumulation of multiple neural networks NN is converted into a cyclic neural network, and its simplified diagram is shown on the left of the figure below.
In short, as long as your data is in order, you can use RNN, such as the order of human speech, the order of telephone numbers, the order of image pixels, the order of ABC letters, etc. When explaining the principle of CNN, it can be seen as a filter sliding to scan the whole image, and deepen the understanding of the image by neural network through convolution.
RNN has the same effect of scanning time and sequence, but it also has the effect of memory. RNN captures the dynamic information in the serialized data and improves the prediction results through the periodic connection of the hidden layer.
(2) RNN application
RNN is commonly used in natural language processing, machine translation, speech recognition, image recognition and other fields. The following briefly shares the corresponding structures of RNN related applications.
- RNN emotion analysis:When analyzing whether a person’s speech emotion is positive or negative, use the RNN structure shown in the figure below. It has n inputs and 1 output. The y value at the last time point represents the final output result.
- RNN image recognition:At this time, there is a picture input X and N corresponding outputs.
- RNN machine translation:There are two inputs and outputs respectively, corresponding to Chinese and English, as shown in the figure below.
Next, let’s look at a more powerful structure called LSTM.
(1) Why introduce LSTM?
RNN learns from ordered data. RNN will remember previous data like people, but sometimes forget what they said like Grandpa. In order to solve this disadvantage of RNN, ltsm technology is proposed. Its full English name is long short-term memory. It is also one of the most popular RNNs at present.
Suppose there is a sentence now, as shown in the figure below, RNN judges that this sentence is braised pork ribs, which needs to be learned at this time, and “braised pork ribs” is at the beginning of the sentence.
The word “braised pork ribs” needs a long journey to arrive. It needs to go through a series of errors, and then go through reverse transmission. It will be multiplied by a weight W parameter at each step. If the multiplied weight is a number less than 1, such as 0.9, 0.9 will continue to multiply the error. Finally, when this value is transferred to the initial value, the error disappears, which is called gradient disappearance or gradient dispersion.
On the contrary, if the error is a large number, such as 1.1, the value of this RNN will be very large, which is called gradient explosion.
Gradient disappearance or gradient explosion: in RNN, if your state is a long sequence, suppose the error value of reverse transmission is a number less than 1. Each reverse transmission will be multiplied by this number. The nth power of 0.9 tends to 0, and the nth power of 1.1 tends to infinity, which will lead to gradient disappearance or gradient explosion.
This is also the reason why RNN does not recover its memory. In order to solve the problem of gradient disappearance or gradient explosion when RNN gradient decreases, LSTM is introduced.
LSTM has made some improvements on the ordinary RNN. LSTM RNN has three more controllers, namely input, output and forgetting controllers. There are more mainlines on the left, such as the main plot of the film, while the original RNN system has become a branch plot, and the three controllers are on the branch.
- Input controller (write gate):Set a gate when inputting input. The function of gate is to judge whether to write this input into our memory. It is equivalent to a parameter and can be trained. This parameter is used to control whether to remember the current point.
- Output controller (read gate):In the gate of the output position, judge whether to read the current memory.
- Forget gate:The forgetting controller of the processing position determines whether to forget the previous memory.
The working principle of LSTM is: if the split plot is very important for the final result, the input controller will write the split plot into the main plot according to the importance, and then analyze it; If the split plot changes our previous idea, the forgetting controller will forget some main plot, and then replace the new plot in proportion, so the update of main plot depends on input and forgetting control; The final output will be based on the main plot and sub plot.
Through these three gates, we can well control our RNN. Based on these control mechanisms, LSTM is a good medicine to delay memory, so as to bring better results.
II Case description of LSTM RNN regression
Earlier, we explained the classification of RNN and CNN. This article will share a regression problem. In the LSTM RNN regression case, we want to use the blue dotted line to predict the red solid line. Since the sin curve is a wave cycle, RNN will use one sequence to predict another sequence.
The basic structure of the code includes:
(1) Get function for generating data_ batch()
(2) Principal LSTM RNN
(3) Three layer neural network, including input_ layer、cell、output_ Layer is the same as the structure of RNN previously classified.
(4) Calculation error function computer_ cost
(5) Error weight and bias
(6) Establish LSTM RNN model with main function
(7) Tensorboard visual neural network model, Matplotlib visual fitting curve
Finally, add BPTT and start our coding.
(1) Common RNN
Suppose we train a sequence containing 1000000 data. If we train all the sequences, the whole sequence will be fed into RNN, which is easy to cause the problem of gradient disappearance or explosion. Therefore, the solution is truncated back propagation (BPTT). We truncate the sequence for training (num_steps).
The general truncated back propagation is: in the current time t, forward back propagation num_ Steps. As shown in the figure below, for the sequence with length of 6, the number of truncation steps is 3, and the initial state and final state are passed in RNN cell.
(2) Tensorflow version of BPTT
However, the implementation in tensorflow is not like this. It divides the sequence with length of 6 into two parts, and the length of each part is 3. The final state calculated in the previous part is used for the initial state calculated in the next part. As shown in the figure below, each batch performs a separate truncated back propagation. At this time, the batch will save the final state as the initialization state of the next batch.
reference resources:Deep learning (07) implementation of rnn-recurrent neural network-02-tensorflow
III code implementation
The first step is to open anaconda, then select the built “tensorflow” environment and run Spyder.
Step 2: import the expansion package.
The third step is to write the get function to generate data_ Batch (), which generates a sequence of sin curves.
The output result at this time is shown in the figure below. Note that it is only the expected curve of simulation, not the structure of our neural network learning.
The fourth step is to write lstmrnn class, which is used to define our cyclic neural network structure, initialization operation and required variables.
The parameters for initializing the init() function include:
• n_ Steps refers to the steps in batch. There are three steps in total.
• input_ Size refers to the length of each input when batch data is passed in. Input in this instance_ Size and output_ Size is 1. As shown in the figure below, suppose our batch length is one cycle (0-6), each input is the x value of the line, and the input size indicates how many values there are at each time point. There is only one point, so it is 1.
• output_ Size indicates the output value. The y value of the corresponding input line is output, and its size value is 1.
• cell_ Size indicates the number of RNN cells, and its value is 10.
• batch_ Size indicates the number of batches transmitted to the neural network at one time, which is set to 50.
The code of this part is as follows. Pay attention to the shape of XS and ys. At the same time, we need to use tensorboard to visualize the structure of RNN, so we call TF name_ Scope() sets the namespace name of each neural layer and variable. See article 5 for details.
The fifth step is to write three functions (three-layer neural network), which is the core structure of RNN.
These three functions are also added to the class of lstmrnn. The core code and detailed comments are as follows:
Note that reshape () is called to update the shape. Why change the three-dimensional variable to two-dimensional? Because w * x + B can be calculated only after it becomes a two-dimensional variable.
The sixth step is to define the calculation error function.
Note here: We used the seq2seq function. The loss calculated by it is the loss of each step of the whole batch, and then sum the loss of each step to become the loss of the whole tensorflow, and then divide it by the average of the batch size to finally get the total cost of the batch, which is a scalar number.
In the following articles, we will write an article about machine translation in detail and use the seq2seq model.
The seq2seq model is used when the length of the output is uncertain. This situation usually occurs in the task of machine translation. If a Chinese sentence is translated into English, the length of the English sentence may be shorter or longer than Chinese, so the length of the output is uncertain. As shown in the figure below, the input Chinese length is 4 and the output English length is 2.
In the network structure, input a Chinese sequence and then output its corresponding Chinese translation. After the result of the output part is predicted, according to the above example, first output “machine”, take “machine” as the next input, and then output “learning”, so as to output any long sequence.
Machine translation, man-machine dialogue, chat robot and so on are all applied in today’s society. They are more or less applied to what we call seq2seq here.
Step 7: define MSR_ Error calculation function, error calculation function and offset calculation function.
After writing here, the whole class is defined.
Step 8: next, define the main function for training and prediction. Here, try the tensorboard visual display first.
IV Complete code and visual display
The complete code of this stage is as follows. Let’s try to run the following code first:
A new “logs” folder and events file will be created in the python file directory, as shown in the following figure.
Next, try opening it. First call out Anaconda prompt and activate tensorflow. Then go to the directory of events file and call the command “tensorboard — logdir = logs” to run, as shown in the figure below. Note that here you only need to guide to the folder, and it will automatically index your files.
Visit the web address at this time“http://localhost:6006/”, select “graphs”. After running, as shown in the figure below, our neural network appears.
The neural network structure is shown in the figure below, including input layer, LSTM layer, output layer, cost error calculation, train training, etc.
The detailed structure is shown in the figure below:
Usually, we will put the train part aside, select “train” and right-click “remove from main graph”. The core structure is as follows, in_ Hidden is the first layer to accept input, followed by LSTM_ Cell, and finally the output layer out_ hidden。
- in_hidden：It includes weights and bias, and the calculation formula is Wx_ plus_ b。 At the same time, it includes the reshape operation, 2_ 2D and 2_ 3D。
- out_hidden：It includes weight, bias and calculation formula Wx_ plus_ b. 2D data 2_ 2D, and the output result is cost.
- cost：Calculation error.
- In the middle is LSTM_ cell：Including RNN cyclic neural network, initialization initial_ State, which will be updated and replaced later.
Pay attention to the version. Readers can modify and run it properly in combination with their tensorflow version. The author’s version information is: Python 3 6、Anaconda3、Win10、Tensorflow1. 15.0。
If you report an error, attributeerror: module ‘tensorflow_ api. v1. NN ‘has no attribute’ seq2seq ‘, this is the tensorflow version upgrade, and the method call is changed. Solution:
If you report an error typeerror: MSR_ error() got an unexpected keyword argument ‘labels’，msr_ The error () function gets an unexpected key parameter ‘labels’. The solution: define MSR_ When using the error() function, use labels and Logits to specify that
If you report an error valueerror: variable in_ hidden/weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf. AUTO_ REUSE in VarScope? ， Then restart the kernel to run.
V Prediction and curve fitting
Finally, we write the code of RNN training, learning and prediction in the main function.
First, let’s test the results of cost learning. The code is as follows: cell in if judgment_ init_ State is the previously initialized state, and then update state (model. Cell_init_state: state). In fact, it is to replace the final state with the initial state of the next batch, so as to meet the structure defined by us.
The results are output every 20 steps. As shown below, the error is from the initial 33 to the final 0.335. The neural network is learning and the error is decreasing.
Next, add the dynamic fitting process of sin curve visualized by Matplotlib. The final complete code is as follows:
Writing here, this article is finally finished. The article is very long, but I hope it will help you. LSTM RNN predicts another set of data from one set of data. The prediction effect is shown in the figure below. The solid red line represents the line to be predicted, and the dotted blue line represents the line to be learned by RNN. They are constantly approaching. The blue line learns the law of the red line, and finally basically fits the blue line to the red line.
After the introduction of this article, more tensorflow in-depth learning articles will continue to be shared. Next, we will share supervised learning, Gan, machine translation, text recognition, image recognition, speech recognition and so on. If readers have anything they want to learn, they can also talk to me in private. I will learn and apply it to your field.
Finally, I hope this basic article will be helpful to you. If there are errors or deficiencies in the article, please Haihan ~ as a rookie of artificial intelligence, I hope I can make continuous progress and deepen it, and then apply it to image recognition, network security, confrontation samples and other fields to guide you to write simple academic papers. Come on!
Code download address (welcome to pay attention and praise):