In the previous article, we discussed the basic contents related to the cyclic neural network. Today, we will continue to explore the cyclic neural network and other more advanced uses that need attention.
Reduce over fitting
In the previous discussion, we often talked about the problem of fitting. We generally judge the condition of the training network as the completion of training. When we look at its accuracy and loss, we also look at the data before over fitting. One way to avoid over fitting is to use dropout Method, it is realized by random clearing, but in the cyclic neural network, this problem is a little complicated.
It has been found in a large number of experiments that dropout before the loop layer is not helpful to reduce over fitting, and may even affect the normal training of the network. How to dropout in the loop layer was proposed in a paper in 2015. The specific way is: in each time step, use the same dropout Mask, and apply the dropout mask which does not change with time step to the inner loop activation of the layer, so that the learning error can be passed on. If the loop neural networks such as LSTM and Gru are used in keras, dropout (input unit dropout) and recurrent can be set_ Out (loop unit dropout) to reduce over fitting, in general, the best situation will not have a big decline, but will be more stable, is a way to optimize the network. The method used is:
model.add(layers.GRU(32, dropout=0.2, recurrent_dropout=0.2, input_shape=(None, float_data.shape[-1])))
Loop layer stack
The general process of our training network is to build a network. Before serious over fitting, we will increase the network capacity as much as possible (let more feature points), which helps the network model better grasp the characteristics of the data. For the cyclic neural network, it is also a similar idea to stack the cyclic layer, and in general, it will make the data better , which is the most commonly used and effective tuning method (to make the data better, and the specific improvement depends on the situation). There are quite a lot of similar practices in Google products. The method used is:
model.add(layers.GRU(32, dropout=0.1, recurrent_dropout=0.5, return_sequences=True, input_shape=(None, float_data.shape[-1]))) model.add(layers.GRU(64, activation='relu', dropout=0.1, recurrent_dropout=0.5))
Use bidirectional RNN
To solve this problem, I always think it is a metaphysics. Two way RNN, as the name implies, is a circular network containing two ordinary Rn with opposite directions, one for forward processing and the other for reverse processing, because we know RNN It is the data value processed by the previous item that is sensitive to the order, and it will affect the next item of data. Therefore, different processing of data in different directions will obtain different characteristics of data, or the reverse RN will recognize the characteristics ignored by the forward RN, and then supplement the deficiency of the forward RN. In this way, the whole RNN may be made The effect is better. It has some metaphysical features, but it can be understood. In a word, it is also an effective way. The method used is:
For neural networks, which specific optimization method is really more effective? Actually, according to the actual data of the actual problems, there are quite big differences, which can not be generalized. Therefore, this is also a work that needs experience and patience. Maybe you will send the question: “I will go, why can’t I do this? I’ll go. Why not? Why did I do that? “
Of course, there are some other methods that may be useful for better training networks, such as adjusting activation functions, adjusting optimizer learning rate, etc. with a little patience, you will be able to train a satisfactory network.
Let’s talk about it for a while. There are still many details, but the important problems haven’t been introduced in detail. We will have a chance to continue the discussion in the future.
Peace to the world!
- This article starts with the official account number: RAIS