Machine learning practitioners have different personalities, although some of them will say “I am an expert in X, X can train any type of data”, where x is an algorithm. However, we have to admit that in real life, there is no X that can train any kind of data. Some algorithms are suitable for some industries, but they are not suitable for that industry.

There is a consensus in the field of data science: as data scientists, we must learn as much as possible about general and learning algorithms. In this way, we can have more solutions when facing problems in different industries. In this paper, the general machine learning algorithm is briefly described, and the related resources about them are provided, so as to help you quickly grasp the mystery.

## 1. Principal component analysis (PCA) / SVD

PCA is an unsupervised method for understanding the global properties of a data set composed of vectors. Here we focus on analyzing the covariance matrix of data points to understand which dimensions / data points are more important (i.e. they have a high degree of covariance, but a low degree of covariance with other variables). One way to consider the top principal component (PC) of a matrix is to consider the eigenvector with the highest eigenvalue. Singular value decomposition (SVD) is essentially a method of calculating ordered components, but you don’t need to get the covariance matrix of a point to get it.

This algorithm helps people overcome the curse of dimension by obtaining data points with reduced dimension

Library address:

https://docs.scipy.org/doc/sc…

http://scikit-learn.org/stabl…

Getting started:

https://arxiv.org/pdf/1404.11…

## 2. Least square method and polynomial fitting

Do you remember the course of numerical analysis in the university? You can use them to fit the curves of small datasets with low dimensions in machine learning. (for large data or data sets with multiple dimensions, you may end up over fitting.) OLS has a closed solution, so you don’t need to use complex optimization techniques.

It is clear from the above figure that this algorithm can be used to fit a simple curve / regression

Library address:

Https://docs.scipy.org/doc/nu… Https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.polyfit HTML

Getting started:

https://lagunita.stanford.edu…

## 3. Constrained linear regression

Least squares may be confused with outliers, false fields, and noise in data. Therefore, we need constraints to reduce the line variance produced by fitting on the dataset. The way to do this is to fit the linear regression model to ensure that the weights are correct. The model can have L1 norm (lasso) or L2 (ridge regression) or both. By this method, the mean square loss can be optimized.

Using this algorithm to fit regression lines with constraints can avoid over fitting and covering up noise dimensions in the model.

Library address:

http://scikit-learn.org/stabl…

Getting started:

https://www.youtube.com/watch…

https://www.youtube.com/watch…

## 4. K-means clustering

Most machine learning practitioners like unsupervised clustering algorithm. Given a set of vector data points, we can make a point cluster according to the distance between them. This is an expectation maximization algorithm, which moves the cluster centers iteratively, and then focuses on each cluster center point. The input of the algorithm is the number of clusters to be generated and the number of iterations it will try to cluster.

As you can see from the name, you can use this algorithm to create K clusters in the dataset.

Library address:

http://scikit-learn.org/stabl…

Introductory course

https://www.youtube.com/watch…

https://www.datascience.com/b…

## 5. Logistic regression

Logistic regression is a linear regression with nonlinear application (mainly using sigmoid function or tanh function) after having weight, so the output limit is close to + / – class (1 and 0 for sigmoid). The cross entropy loss function is optimized by gradient descent. Note for beginners: logistic regression is used for classification, not regression. You can also think of logistic regression as a single-layer neural network. Logistic regression was trained by gradient descent or l-bfgs. NLP practitioners often use it under the name of the maximum entropy classifier.

This is what a sigmoid looks like:

Library address:

http://scikit-learn.org/stabl…

Introductory course

https://www.youtube.com/watch…

## 6. Support vector machine (SVM)

Support vector machine is a linear model of linear / logical regression, the difference is that they have different loss functions based on boundary (the derivation of support vector is one of the most wonderful mathematical results I have observed with the calculation of eigenvalues). You can use optimization methods such as l-bfgs or SGD to optimize the loss function.

Another innovation of SVM is to provide data kernel to Data Engineer. If you have good insight, you can replace the old RBF kernel with a smarter one.

What SVM can do is to learn a class classifier.

Support vector machines can be used to train classifiers (even regressors).

Library address:

http://scikit-learn.org/stabl…

Introductory course

https://www.youtube.com/watch…

Note: SGD Based Logistic regression and SVM training can be found in sklearn, which I often use, because it allows me to check LR and SVM with a common interface.

## 7. Feedforward neural network (FFNN)

This is a multi-level logistic regression classifier. Many weight layers are separated by nonlinearity (S-shape, tanh, relu + softmax and Selu). Its other name is multi-layer perceptron. FFNN can be used for classification and unsupervised feature learning of automatic encoder.

Multilayer perceptron

FFNN as auto encoder

FFNN can be used as an automatic encoder to train classifiers or to extract features.

Library address:

http://scikit-learn.org/stabl…

http://scikit-learn.org/stabl…

https://github.com/keras-team…

Introductory course

http://www.deeplearningbook.o…

http://www.deeplearningbook.o…

http://www.deeplearningbook.o…

## 8. Convolutional neural networks

Almost all the most advanced machine learning results based on vision in the world are realized by convolutional neural network. They can be used for image classification, object detection and image segmentation. It was invented by Yann Lecun in the late 1980s and early 1990s. ConvNets has a convolution layer as a layered feature extractor. You can also use them (even charts) in text.

Library address:

https://developer.nvidia.com/…

https://github.com/kuangliu/t…

https://github.com/chainer/ch…

https://keras.io/applications/

Introductory course

http://cs231n.github.io/

https://adeshpande3.github.io…

## 9. Recurrent neural network (RNNs)

The RNN model sequence recursively applies the same weight set to the aggregator state at time t, and inputs at time t (the given sequence has input at time t, and has hidden state at each time t, which is output from the T-1 step of RNN). Pure RNN is rarely used now, but similar models like LSTM and Gru are the most advanced in most sequence modeling tasks.

RNN (if there are closely connected elements and nonlinearity, the current f is usually LSTM or Gru). LSTM unit is used to replace simple dense layer in pure RNN.

RNN is used in time series modeling tasks, especially text classification, machine translation and language modeling.

Library address:

Https://github.com/tensorflow

https://github.com/wabyking/T…

http://opennmt.net/

Getting started:

http://cs224d.stanford.edu/

http://www.wildml.com/categor…

http://colah.github.io/posts/…

## 10. Conditional random fields (CRFs)

CRF is probably the most commonly used model in PGM series. They are used for sequence modeling like RNN and can also be used in combination with RNN. Before neural machine translation systems entered CRF, they were the most advanced technology, and they still performed better than RNN in many sequential tagging tasks with small data sets. They can also be used in other structured prediction tasks, such as image segmentation. CRF models each element (such as a sentence) in the sequence so that the nearest neighbor affects the tags of a component in the sequence, rather than all tags are independent of each other.

Use CRF to tag sequences (text, images, time series, DNA, etc.).

Library address:

https://sklearn-crfsuite.read…

Introductory course

http://blog.echen.me/2012/01/…

https://www.youtube.com/watch…

## 11. decision tree

For example, I gave an excel worksheet about all kinds of fruit data, and I had to mark this as an apple and that as other types of fruit. So, if I were to ask the question, “which fruits are red, which things are round?” All responses are indicated by “yes” and “no”. Now, all red and round fruits may not be apples, and none of them will be red and round. So I will ask a question: “which fruits have red or yellow hints? “Red and round fruit, and will ask” which fruit is green and round? Instead of red and round fruit? Based on these questions, I can wait until the exact answer – Apple. The solution to this problem is to use decision trees. But this is a decision tree based on my intuition. Intuition cannot deal with high-dimensional and complex data. We have to look at the tag data to automatically raise a cascade of questions, which is what the decision tree based on machine learning does. Early versions such as cart trees can only be used for simple data, but for larger and larger data sets, the trade-off between deviation and variance needs to be solved by better algorithms. Two common decision tree algorithms used now are random forest (building different classifiers on random subsets of their attributes and combining them for output) and boosting trees (training the cascading of trees on the basis of other trees to correct the errors of the trees below them).

Decision trees can be used to classify data points (or even regression).

Library

http://scikit-learn.org/stabl…

http://scikit-learn.org/stabl…

http://xgboost.readthedocs.io…

https://catboost.yandex/

Introductory course

http://xgboost.readthedocs.io…

https://arxiv.org/abs/1511.05741

https://arxiv.org/abs/1407.7502

http://education.parrotpredic…

These are ten machine learning algorithms that you can learn to be a data scientist.

Article title: 10 machine learning algorithms you should know to become a data scientist

By Shashank Gupta

Please check the original text for details