### Learning rate

As mentioned earlier, the gradient vector has direction and size. The gradient descent algorithm multiplies the gradient by a scalar called learning rate (sometimes called step size) to determine the position of the next point. For example, if the gradient size is 2.5 and the learning rate is 0.01, the gradient descent algorithm will select the position 0.025 away from the previous point as the next point

### Super parameter

Hyperparameters are knobs that programmers use to adjust in machine learning algorithms. Most machine learning programmers spend a lot of time adjusting the learning rate. If the learning rate you choose is too small, it will take too long to learn:

#### Learning rate is too low

On the contrary, if the learning rate you specify is too large, the next point will always bounce freely at the bottom of the U-shaped curve, as if there were a serious error in the quantum mechanics experiment:

#### Excessive learning rate

### Blonde

Each regression problem has a blonde learning efficiency. The “Blonde” value is related to the flatness of the loss function. If you know that the gradient of the loss function is small, you can safely try to use a larger learning rate to compensate for the smaller gradient and obtain a larger step size

#### The learning rate is just right

This work adoptsCC agreement, reprint must indicate the author and the link to this article