Recommendation System — neural network

Time:2020-6-28

gradient

Back propagation

todo 

By the way of multiplication.

Gradient instability

The gradient on the front layer is the product of the gradient on the back layer.
(this is also the root cause of gradient instability)

When there are too many layers, the gradient instability will appear.

Gradient instability includes gradient disappearance and gradient explosion.

In practice, gradient vanishing is easier to appear than gradient explosion.

Process:
    The W value is multiplied by the derivative value of the activation function to obtain a result,
    A final result is obtained. If the final result is greater than 1, gradient explosion may occur, and if it is less than 1, gradient disappearance may occur.
    (it needs to be confirmed that under what circumstances does the user appear frequently?)
    
Possible causes:
    Too many layers of hidden layer will lead to gradient disappearance or gradient explosion.

Phenomenon:
    When the weight value is updated, the update speed of the weight value of the front hidden layer is much slower than that of the back hidden layer.


solve:
    Use the appropriate activation function: 
        ReLU、Leaky-ReLU、P-ReLU、R-ReLU、Maxout
     
     Batch Normalization:
        todo
     
     The structural design of LSTM can also improve RNN gradient disappearance:
        todo

Gradient disappearance

definition:
    In the neural network, the learning rate of the front hidden layer is lower than that of the back hidden layer.
    
    That is to say, with the increase of the number of hidden layers, the accuracy of classification decreases

Causes:
    Inappropriate activation function used

Activation function angle:
    The maximum derivative of sigmoid function is 1 / 4, the more layers, the smaller the multiplication result.

Gradient explosion

definition:
    
    If there is a very large value in the continuous multiplication, the final calculated gradient will be very large.
    It is equivalent to a large cliff, which will obtain a large gradient value.
    
    If the gradient value is used for updating, the step size of this iteration will be very large, and it may fly out of a reasonable area at once.

Causes:
    The initialization value of the weight is too large.

Activation function

Saturation

definition:
    When the independent variable approaches to positive and negative infinity, the function value approaches to 0.
    
    Unilateral approach to 0, unilateral saturation.
    
    The definition of unsatisfied saturation function is called unsaturation function.

Disadvantages of saturation function:
    The gradient will disappear

reference resources
What is gradient explosion? How to solve it?
The causes of disappearing gradient problems
Causes and solutions of gradient disappearance and gradient explosion
Characteristics of saturated and unsaturated activation functions