todo By the way of multiplication.
The gradient on the front layer is the product of the gradient on the back layer. (this is also the root cause of gradient instability) When there are too many layers, the gradient instability will appear. Gradient instability includes gradient disappearance and gradient explosion. In practice, gradient vanishing is easier to appear than gradient explosion. Process: The W value is multiplied by the derivative value of the activation function to obtain a result, A final result is obtained. If the final result is greater than 1, gradient explosion may occur, and if it is less than 1, gradient disappearance may occur. (it needs to be confirmed that under what circumstances does the user appear frequently?) Possible causes: Too many layers of hidden layer will lead to gradient disappearance or gradient explosion. Phenomenon: When the weight value is updated, the update speed of the weight value of the front hidden layer is much slower than that of the back hidden layer. solve: Use the appropriate activation function: ReLU、Leaky-ReLU、P-ReLU、R-ReLU、Maxout Batch Normalization： todo The structural design of LSTM can also improve RNN gradient disappearance: todo
definition: In the neural network, the learning rate of the front hidden layer is lower than that of the back hidden layer. That is to say, with the increase of the number of hidden layers, the accuracy of classification decreases Causes: Inappropriate activation function used Activation function angle: The maximum derivative of sigmoid function is 1 / 4, the more layers, the smaller the multiplication result.
definition: If there is a very large value in the continuous multiplication, the final calculated gradient will be very large. It is equivalent to a large cliff, which will obtain a large gradient value. If the gradient value is used for updating, the step size of this iteration will be very large, and it may fly out of a reasonable area at once. Causes: The initialization value of the weight is too large.
definition: When the independent variable approaches to positive and negative infinity, the function value approaches to 0. Unilateral approach to 0, unilateral saturation. The definition of unsatisfied saturation function is called unsaturation function. Disadvantages of saturation function: The gradient will disappear
What is gradient explosion? How to solve it?
The causes of disappearing gradient problems
Causes and solutions of gradient disappearance and gradient explosion
Characteristics of saturated and unsaturated activation functions