Machine learning notes


machine learning


  • Machine learning is a science of artificial intelligence. The main research object in this field is artificial intelligence, especially how to improve the performance of specific algorithms in empirical learning.

  • Machine learning is the study of computer algorithms that can be improved automatically by experience.

  • Machine learning is the use of data or past experience to optimize the performance standards of computer programs.

(official language from Wikipedia)

Simple understanding is to let the machine learn relevant algorithms, have the ability to predict, and then make relevant operations. The essence of machine learning is to let the machine find the correlation in the data by giving the machine data

A simple understanding of AI includes machine learning algorithm, search algorithm, etc., and deep learning is an extension of machine learning.


Data set: a set composed of data. Generally, the data set contains features and labels. The data in each row is represented as a sample, the data in each column (except the last column) is represented as a feature, and the data in the last column is represented as a label. In the specific algorithm, the data set includes training set and test set. Feature space can be produced by dataset visualization, and high-dimensional feature space can be produced by feature dimension.

technological process

General process:

Learning data – > machine learning algorithm – > model – > input sample – > output result

Forecast results


Classification and regression
  • Two types of tasks are selected according to the process of machine learning
    • Classification: when you want machine learning to predict categories
      • Common classification: two classification, multi classification
    • Regression: we hope that machine learning can predict the value of continuous numbers
      • Regression task can be simplified as classification task
Is there supervision

Supervised learning, unsupervised learning, semi supervised learning and reinforcement learning

  • Supervised learning: marking training data for machines

    • Common supervised learning: KNN, linear regression, polynomial regression, logistic regression, SVM, decision tree, random forest
  • Unsupervised learning: the training data given to the machine is not marked

    • Common unsupervised learning: cluster analysis, dimension reduction of data, feature extraction of data set
  • Semi supervised learning: some of the training data of the machine is marked, and the other is not

    • Causes of data missing: samples or markers missing due to various reasons
    • Semi supervised learning is more common in peacetime, most of which require us to process data, and then give it to the machine for learning
  • Reinforcement learning: take action according to the environment on Friday, and learn the way of action according to the result of action

    • Based on supervised learning and semi supervised learning
learning environment

Batch learning, online learning

  • Batch learning: when training the model, input all samples at one time

    • Advantages: simple, write a good algorithm will not change and improve

    • Disadvantages: unable to adapt to changes in the environment, want to adapt to changes need to re batch learning

  • Online learning: when training the model, each input sample will calculate the error and adjust the parameters

    • Advantages: timely response to new environmental changes

    • Disadvantages: adverse changes that new data may bring

learning style

Parametric learning, nonparametric learning

  • Parameter learning: Based on data, hypothesis relationship, find relationship parameters

    • Features: Learn parameters through data set learning. When learning parameters, the original data set is no longer needed
  • Nonparametric learning: not making too many assumptions about the model

    • Note: nonparametric does not mean no parameter

This work adoptsCC agreementReprint must indicate the author and the link of this article