15000 star! Mathematical principles of popular machine learning algorithms


[introduction]: the GitHub project recommended in this paper uses Python to implement popular machine learning algorithms, including the analysis behind the algorithm implementation. Each algorithm has an interactive jupyter notebook demo, which you can use to train data and algorithm configuration, and view results, charts and forecasts in a browser in real time.

brief introduction

The goal of homemade machine learning is not to realize machine learning algorithms by using a third-party library, but to realize them from scratch and better understand the mathematical principles behind each algorithm. This is why all algorithm implementations are called “Homemate”, and these algorithms are not intended to be used in actual production.

(sentry King’s note: because wechat does not support external chain, the following   Mathematics ⚙️   Code and   ▶️   Presentation, you need to jump to view it on the project home page.)

Address of the project:https://github.com/trekhleb/h…

Supervised learning

In supervised learning, we take a set of training data as input, and a set of labels or “correct answers” as output in each training group. Then, we train our model (machine learning algorithm parameters) to map the input to the correct output (correct prediction). The ultimate goal is to find a model parameter that can maintain a continuous and correct input – > output mapping (prediction) even for a new input example.


In the regression problem, we do real value prediction. Basically, we try to draw a line / plane / n-dimensional plane along the example given by the training.

Use examples: stock price forecast, sales analysis, any number dependence, etc.

linear regression

15000 star! Mathematical principles of popular machine learning algorithms

  •   Mathematical linear regression  – Link between theory and extended reading
  • ⚙️   Code | linear regression – implementation example
  • ▶️   Demonstrate univariate linear regression  – Predicting national happiness index through economic GDP
  • ▶️   Demonstrate multiple linear regression  – The national happiness index is predicted by economic GDP and freedom index
  • ▶ ️   Demonstration | nonlinear regression  – The nonlinear dependence is predicted by linear regression with polynomial and sinusoidal characteristics


In the classification problem, we filter the input cases through some features.

Usage examples: garbage filter, language detection, finding similar documents, handwritten character recognition, etc.

logistic regression

  • Mathematics | logical regression – a link between theory and extended reading
  • ⚙️ Code | logistic regression – implementation example
  • ▶ Demonstrate | logistic regression (linear boundary) – predict iris based on petal length and width
  • ▶️ Demonstrate the effectiveness of logistic regression (nonlinear boundary) – predicting microchips based on parameters 1 and 2
  • ▶ Demonstrate | multiple logistic regression | MNIST – recognize handwritten digits from 28×28 pixel pictures
  • ▶️ Demonstrate | multiple logistic regression | popular MNIST – identify clothing types from 28×28 pixel pictures

Unsupervised learning

Unsupervised learning is a branch of machine learning. The test data used in learning are not marked, classified or summarized. Unsupervised learning has no feedback, but learns to identify commonalities in the data, and the subsequent specific response is based on the commonalities that exist or do not exist in each data segment.


In the clustering problem, we filter training examples through unknown features. The algorithm itself determines what characteristics are used for filtering.

Examples: market segmentation, social network analysis, Organizational Computing Cluster, astronomical data processing, image compression, etc. K-means algorithm

Mathematics | k-means algorithm – link between theory and extended reading
⚙️ Code | k-means algorithm – implementation example
▶️ Demonstrate | k-means algorithm – predict iris based on petal length and width

Anomaly detection

Anomaly detection (also known as outlier detection) is the identification of rare items, events or observations. It mainly improves the suspicious characteristics by comparing the significant differences from the master data.

Use examples: intrusion detection, fraud detection, system health monitoring, and deleting abnormal data from the data set.

Anomaly detection using Gaussian distribution
Mathematics | anomaly detection using Gaussian distribution – links to theory and extended reading
⚙️ Code | anomaly detection using Gaussian distribution – implementation use case
▶️ Demonstrate | exception detection – find exceptions in server operations, such as latency and critical values

neural network

Neural network is not an algorithm, but an algorithm framework of machine learning, which is mainly used to deal with complex data input.

Usage example: it is usually a substitute for all other algorithms, such as image recognition, sound recognition, image processing (analyzing specific features), language translation, etc.

Multilayer cognition(MLP)

Mathematics | multi-layer cognition – a link between theory and extended reading
⚙ code | multi-layer cognition – implementation use case
▶ Demonstrate | multi-layer cognition | MNIST – recognize handwritten handwriting from images with 28×28 pixels
▶️ Demonstrate | multi-layer cognition | popular MNIST – identify clothing types from 28×28 pixel pictures

Machine learning Atlas

15000 star! Mathematical principles of popular machine learning algorithms

Learning premise

  1. Install Python

    1. Install dependencies. Run the following command to install all dependencies required for this project:
pip install -r requirements.txt
  1. Start jupyter locally or remotely

data set

The data sets used in this project can be found in the following links:


Open source outpostShare popular, interesting and practical open source projects on a daily basis. Participate in maintaining the open source technology resource library of 100000 + star, including python, Java, C / C + +, go, JS, CSS, node.js, PHP,. Net, etc.