Gradient descent method for machine learning

Time:2021-10-25

Suppose we have time and computational resources to calculate the loss of all possible values of W1. For the regression problem we have been studying, the graph of the loss and W1 is always convex. In other words, the graph is always a bowl graph, as shown in the following figure:

Gradient descent method for machine learning

The loss and weight graph generated by the regression problem is convex

Convex problem has only one lowest point; That is, there is only one position where the slope is exactly 0. This minimum is the convergence of the loss function

This method is too inefficient to find the convergence point by calculating the loss function of W1 each possible value in the whole data set. Let’s study a better mechanism, which is very popular in the field of machine learning, called gradient descent method

The first stage of gradient descent method is to select a starting value (starting point) for W1. The starting point is not important; Therefore, many algorithms directly set W1 to 0 or randomly select a value. The following figure shows that we have selected a starting point slightly greater than 0:

Gradient descent method for machine learning

Starting point of gradient descent method

Then, the gradient descent algorithm will calculate the gradient of the loss curve at the starting point. In short, the gradient is the vector of partial derivatives; It lets you know which direction is “closer” or “farther” from the target. Note that the gradient of the loss relative to a single weight is equal to the derivative

Partial derivatives and gradients

Note that the gradient is a vector, so it has the following two characteristics:

direction

size

The gradient always points to the direction with the most rapid growth in the loss function. The gradient descent algorithm will take a step along the direction of the negative gradient in order to reduce the loss as soon as possible

Gradient descent method for machine learning

Gradient descent method depends on negative gradient

In order to determine the next point on the loss function curve, the gradient descent algorithm will add a part of the gradient size to the starting point, as shown in the following figure:

Gradient descent method for machine learning

A gradient step moves us to the next point on the loss curve

Then, the gradient descent method repeats this process and gradually approaches the lowest point

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Hacking