Working principle of gradient descent algorithm in machine learning


Working principle of gradient descent algorithm in machine learning

By nikil_ REDDY
Compile VK
Source: analytics vidhya


Gradient descent algorithm is one of the most commonly used machine learning algorithms in industry. But it confuses a lot of new people.

If you’re new to machine learning, the math behind gradient descent isn’t easy. In this article, my goal is to help you understand the intuition behind gradient descent.

We will quickly understand the role of the cost function, the explanation of the gradient decline, and how to select the learning parameters.

What is the cost function

It is a function that measures the performance of the model on any given data. The cost function quantifies the error between the predicted value and the expected value, and expresses it in the form of a single real number.

After assuming the initial parameters, we calculate the cost function. In order to reduce the cost function, the gradient descent algorithm is used to modify the parameters of the given data. Here is its mathematical expression:

![]( (41)_LI.jpg)

What is gradient descent

Suppose you are playing a game where players are at the top of the mountain and they are asked to reach the lowest point of the mountain. In addition, they were blindfolded. So, how do you think you can get to the lake?

Take a moment to think about it before you read on.

The best way is to look at the ground and find out where the ground is falling. From this position, take a step down and repeat the process until you reach the lowest point.

Gradient descent method is an iterative optimization algorithm for solving local minimum of function.

In order to find the local minimum of a function by gradient descent method, we must choose the direction of the negative gradient (far away from the gradient) of the function at the current point. If we take a positive direction with the gradient, we will approach the local maximum of the function. This process is called gradient rise.

Gradient descent was first proposed by Cauchy in 1847. It’s also called the steepest descent.

The goal of gradient descent algorithm is to minimize a given function (such as cost function). To achieve this goal, it iteratively performs two steps:

  1. Calculate the gradient (slope), the first derivative of the function at this point

  2. Take a step (move) in the opposite direction of the gradient

![]( (43).png)

Alpha is known as a tuning parameter in the learning rate optimization process. It determines the step size.

Rendering gradient descent algorithm

When we have a single parameter( θ), We can plot the dependent variable cost on the Y-axis and the x-axis θ。 If there are two parameters, we can do 3D drawing, one axis has cost, the other two axis has two parameters( θ)。

It can also be visualized by using contours. This shows a two-dimensional, three-dimensional plot that includes the response values of parameters and contours along both axes. The response value away from the center increases and increases with the increase of the ring.

α- Learning rate

We have a way forward and now we have to decide the size of the steps we have to take.

It must be carefully selected to achieve local minimum.

  • If the learning rate is too high, we may exceed the minimum and not reach the minimum

  • If the learning rate is too low, the training time may be too long

a) The learning rate is optimal and the model converges to the minimum

b) The learning speed is too small, it needs more time, but it will converge to the minimum

c) The learning rate is higher than the optimal value, and the convergence speed is slower (1 / C)

d) The learning rate is very large, it will deviate excessively, deviate from the minimum value, and the learning performance will decline

Note: as the gradient decreases, it moves to the local minimum and the step size decreases. Therefore, the learning rate (alpha) can be kept unchanged in the optimization process without iterative change.

Local minimum

The cost function can be composed of many minimum points. The gradient can fall on any minimum value, which depends on the initial point (i.e. the initial parameter) θ) And the rate of learning. Therefore, the optimization can converge to different points at different starting points and learning rates.

Python code implementation of gradient descent


Once we adjust the learning parameter (alpha) and get the optimal learning rate, we start to iterate until we converge to the local minimum.

Link to the original text:

Welcome to panchuang AI blog:

Sklearn machine learning official Chinese document:

Welcome to pancreato blog Resource Hub: