preface:
At the request of the publishing house, we plan to produce a popular series of articles on machine learning and indepth learning. Please give us more suggestions on the shortcomings.
4.1 introduction to machine learning
Machine learning seems to be a profound term. In fact, it is in life. The old saying goes: “autumn is known when a leaf falls”, which means that you can know that autumn is coming from the withering of a leaf. This contains the simple idea of machine learning, and reveals that the arrival of autumn can be predicted by learning the experience of “fallen leaves” characteristics.
As the core component of artificial intelligence, machine learning is a process in which non explicit computer programs learn data experience to optimize their own algorithms and learn to process tasks. A classic definition of machine learning is: a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
As shown in Figure 4.1, task t is how the machine learning system correctly processes data samples.
Indicator performance P is to measure the correct handling of tasks.
Experience E can be reflected in the parameter value of the model after learning and processing the task. The meaning of model parameters is how to effectively express each feature to deal with tasks.
Further, the process of machine learning can generally be summarized as follows: Based on the given and limited learning data (often based on the assumption that each data sample is independent and identically distributed), the computer program selects a certain model method (that is, assuming that the model to be learned belongs to a set of functions, also known as the hypothesis space), and updates the parameter values (experience) of the model through the algorithm to optimize the index performance of the processing task, Finally, we learn a better model, and use the model to analyze and predict the data to complete the task. Thus, the machine learning method has four elements:
 data
 Model
 Learning objectives
 optimization algorithm
By summarizing machine learning methods into four elements and introducing them accordingly, we can better understand the commonness of various algorithm principles, rather than understanding various machine learning methods independently.
4.1.1 data
Data is the basic raw material of machine learning methods. It is usually composed of data (each line) samples. Samples are composed of characteristics and target value labels (or none) that describe the information of each dimension.
As shown in Figure 4.2, the data set of cancer cell classification task:
4.1.2 model
Learning a “good” model is the direct purpose of machine learning. Simply speaking, machine learning model is a function of learning the relationship between data features and labels or the internal laws of data features.
The machine learning model can be seen as (as shown in Figure 4.3): first select a certain model method, then learn from the data sample (x, (y)), optimize the model parameter w to adjust the effective expression of each feature, and finally obtain the corresponding decision function f (x; W). This function maps the input variable x to the output prediction y under the action of the parameter W, that is, y= f (x; W).
4.1.3 learning objectives
Learn the “good” model, and “good” is the learning goal of the model. “Good” for the model, that is, the error between the predicted value and the actual value should be as low as possible. The specific function to measure this error is called the cost function or the loss functionMaximize and reduce the loss functionTo learn the model.
For different task objectives, it is often necessary to use different loss functions to measure. Classical loss functions such as the mean square error loss function of regression tasks and the cross entropy loss function of classification tasks.
 Mean square error loss function
To measure the error of model regression prediction, we can simply use the predicted value of all samples minus the actual value to calculate the average value, which is also the mean squared error loss function.

Cross entropy loss function
To measure the error of classification prediction model, the cross entropy loss function derived by maximum likelihood estimation method is often used. By minimizing the cross entropy loss, the predicted distribution of the model is consistent with the empirical distribution of the actual data as much as possible.
4.1.4 optimization algorithm
With the goal of minimizing the loss function to learn the “good” model, how to achieve this goal? Our first reaction may be to directly solve the analytical solution of the minimum value of the loss function to obtain the optimal model parameters. Unfortunately, the loss function of machine learning model is usually complex, and it is difficult to find the optimal solution directly. Fortunately, we can optimize the model parameters through finite iterations of optimization algorithms (such as gradient descent algorithm, Newton method, etc.) to reduce the value of the loss function as much as possible and obtain better parameter values (numerical solutions).
The gradient descent algorithm is shown in Figure 4.4, which can be intuitively understood as a downhill process. The loss function J (W) is compared to a mountain. Our goal is to reach the foot of the mountain (that is, to solve the optimal model parameter w to minimize the loss function).
All we have to do is “go down the slope, and every step counts as one step”, and the downward direction is also the direction of J (W) negative gradient. When we go down to a position, we will solve the gradient of the current position, and take another step to the position where this step is located along the steepest and easiest to go down the mountain. We went on like this step by step until we felt that we had reached the foot of the mountain.
Of course, if we go on like this, we may not go to the foot of the mountain (Global Optimization), but to a small valley (local optimization), which is also where the gradient descent algorithm is optimized.
Corresponding to algorithm steps:
Summary
In this paper, we first introduce the basic concept of machine learning, and summarize the general process of machine learning: starting from the data, we set the learning objectives of the task, and use the algorithm to optimize the model parameters to achieve the objectives. From this, we focus on the four components of machine learning (data, model, learning objectives and optimization algorithm). Next, we will further understand the categories of machine learning algorithms.
The article was first published in the advanced algorithm. The original text can be accessed through the official accountGitHub project source code