The answer is: machine learning ≠ algorithm.
Machine learning ≠ algorithm
When we open a textbook or university syllabus, we usually see a bunch of algorithm lists.
This leads to the misunderstanding that machine learning is to master a series of algorithms. In fact, machine learning does not stop at algorithm, we can see it as a comprehensive method to solve problems. The independent algorithms we see are just the corner of the puzzle. The remaining puzzle is how to use these algorithms correctly.
Why is machine learning so magical?
Machine learning is the practice of teaching computers to analyze data and find out the rules so that people can make predictions or decisions.
For real machine learning, the computer must have the ability to analyze the laws that cannot be obtained by programming in the data.
If a child is playing at home, suddenly he sees a candle! So he walked slowly towards the candle.
Out of curiosity, he pointed his finger at the candle;
“Wow!” He shouted, and at the same time he took his hand back;
“Wuwu That glowing red thing is hot! “
Two days later, he came to the kitchen and saw the stove. Again, he was very curious.
He was so curious that he wanted to touch it with his hands;
All of a sudden, he found that this thing would also glow, and be red!
“Ah…” He said to himself, “I won’t hurt again!”
Remembering that something red and glowing would hurt, he left the stove for another place.
Let’s be more clear, because the child infers some conclusion from the candle, which we call “machine learning”.
The conclusion is that “red and glowing” means “pain”;
If the child left the stove because of what his parents told him, it was “clear program instructions”, not machine learning.
Model – a set of patterns derived from data;
Algorithm – a specialized ml process used to train a model;
Training data algorithm is used to train the data set of the model;
Test data – a new set of data used to objectively evaluate the performance of a model;
Feature data sets are used to train model variables;
Target variable – a specific variable used for prediction;
Suppose we have a data set of 150 primary school students, and now we hope to predict their height by their age, gender and weight.
We now have 150 data points, 1 target variable (height), 3 characteristics (age, gender, weight). Next, we will divide all the data into two subsets:
Among them, 120 groups will be used to train different models (training set), and the remaining 30 groups will be used to select the best model (test set).
Machine learning tasks
In academia, machine learning begins and will always focus on one of the algorithms. However, in industry, we must first choose the right machine learning task for our work.
·The task is the specific target of the algorithm.
·As long as the correct task is selected, the algorithm can exchange in and out to complete the task.
·In fact, we’ll try a variety of different algorithms, because it’s likely that we didn’t know which algorithm was best for the dataset at first.
The two most common task categories of machine learning are supervised learning and unsupervised learning.
Supervised learning involves the task of “tagging” good data (in other words, we have a goal variable).
·In practice, it is often used as a high-level form of modeling prediction.
·Each set of data points must be marked correctly.
·Only in this way can we build a prediction model, because we have to tell the algorithm what is “right” in training (that is, we call “supervision”).
·Regression is the task of modeling continuous objective variables.
·Classification is the task of modeling classification target variables.
Unsupervised learning includes tasks for “unmarked” data (in other words, no target variable).
·In practice, this form is usually used for automatic data analysis or automatic signal extraction.
·Unlabeled data does not have a predetermined “right answer.”.
·Allows the algorithm to learn patterns directly from the data (i.e. without “supervision”).
·Clustering is the most common unsupervised learning task, which is used to find groups in data.
Three elements of machine learning
How to consistently build effective models to achieve the best results.
1: skilled chef (human Guide)
First of all, even if we are “teaching computer self-study”, but in this process, people’s guidance also plays a great role.
As we can see, you need to make countless decisions in the process.
In fact, the first major decision is how to plan our project to ensure success.
2: fresh ingredients (clean and relevant data)
The second basic element is data quality.
Whatever algorithm we use, garbage input = garbage output.
Professional data scientists spend most of their time understanding data, cleaning up data and designing new functions.
3: do not over cook (avoid over fitting)
One of the most dangerous traps in machine learning is over fitting. Over fitting the model will “remember” the noise in the training set, rather than learning the real basic mode.
·Over fitting in hedge funds can cost millions of dollars.
·Over fitting in hospitals can cause thousands of deaths.
For most applications, over fitting is a mistake to avoid.