Introduction of C + + Machine Learning Library

Time:2020-10-22

By alakh Sethi
Compile | VK
Source | analytics vidhya

introduce

I like to use C + +. C + + is the first programming language I have learned, and I like to use it in machine learning.

I wrote about building machine learning models earlier. I received a reply asking if there is a machine learning library in C + +?

It’s a fair question. Languages like Python and R have a large number of packages and libraries for different machine learning tasks. Does C + + have such a product?

Yes, yes! In this article, I’ll focus on two such C + + libraries, and we’ll see that they all work.

catalog

  1. Why do we use machine learning libraries?

  2. Machine learning library in C + +

    1. Shark Library

    2. Mlpack Library

Why do we use machine learning libraries?

This is a problem that many newcomers will encounter. What is the importance of Libraries in machine learning? Let me try to explain it in this section.

Experienced professionals and industry veterans, for example, have worked hard and come up with solutions. Would you rather use it or spend hours recreating the same thing from scratch? The latter approach usually doesn’t make sense, especially when you work or study before DDL.

The biggest advantage of our machine learning community is that many solutions already exist in the form of libraries and packages. Others, from experts to enthusiasts, have done the hard work and packaged the solutions well in a library.

These machine learning libraries are effective and optimized and have been thoroughly tested by multiple use cases. With these libraries, our ability to learn and write code, whether in C + + or python, is so simple and intuitive.

Machine learning library in C + +

In this section, we will introduce two of the most popular machine learning libraries in C +:

  1. Shark Library
  2. Mlpack Library

Let’s take a look at their c + + code one by one.

1. Share Library

Shark is a fast module library, which provides strong support for supervised learning algorithms (such as linear regression, neural network, clustering, K-means, etc.). It also includes the functions of linear algebra and numerical optimization. These are very important mathematical functions in machine learning tasks.

We’ll first learn how to install shark and set up the environment. Then we’re going to do linear regression with shark.

Install shark and installation environment (Linux)
  • Shark relies on boost and cmake. Fortunately, you can install all the dependencies using the following command:
sudo apt-get install cmake cmake-curses-gui libatlas-base-dev libboost-all-dev
  • To install shark, run the following command line by line in the terminal:
gitt clone https://github.com/Shark-ML/Shark.git (you can download the zip file and extract as well)
cd Shark
mkdir build
cd build
cmake ..
make

If you don’t see a mistake, then there’s no problem. If you’re in trouble, there’s a lot of information on the Internet. For windows and other operating systems, you can quickly search Google for how to install shark. Here is an installation guide: http://www.shark-ml.org/sphinx_ pages/build/html/rest_ sources/tutorials/ tutorials.html

Compiling programs with shark
  • Include the relevant header files. Suppose we want to implement linear regression, the additional header files include:

Compiling programs with shark

Include the relevant header files. Suppose we want to implement linear regression, the additional header files included are:

#include 
#include

To compile, we need to link to the following libraries:

-std=c++11 -lboost_serialization -lshark -lcblas
Linear regression with shark
Initialization phase

We will start with the library and header functions that contain linear regression:

#Include // header files of all C + + Standard Libraries
#Include // header file for importing CSV data
#Include // header file for implementing square loss function
#Include // header file for linear regression

Next is the dataset. I have created two CSV files. The input. CSV file contains the x value and the tag. The. CSV file contains the y value. Here is a snapshot of the data:

You can get these two files in GitHub Repository: https://github.com/Alakhator/Machine-Learning-With-C- 。

First, we will make a data container to store the values in the CSV file:

Data inputs; // container for storing x value
Data labels; // container for storing y value

Next, we need to import them. Shark provides a good function for importing CSV. We specify the data container to be initialized and the location of the path file of the CSV

importCSV(inputs, " input.csv "); // stores the value in a specific container by specifying the path of the CSV
importCSV(labels, "label.csv");

Then, we need to instantiate a regression dataset type. Now, this is just a general regression object. What we need to do in the constructor is pass in our input and the label of the data.

Next, we need to train linear regression models. What do we do? We need to instantiate a trainer and define a linear model

RegressionDataset data(inputs, labels);
Linearregression trainer; // linear regression model trainer
Linearmodel < > model; // linear model
Training stage

Next are the key steps in our actual training model. Here, the trainer has a member function called train. We train the model with functions

//Training model
trainer.train(model, data);// train function ro training the model.
Forecast stage

Finally, the model parameters are output

//Display model parameters
cout << "intercept: " << model.offset() << endl;
cout << "matrix: " << model.matrix() << endl;

The linear model has a member function called offset, which outputs the intercept of the best fitting line. Next, we output a matrix.

We calculate the best fit line by minimizing the least square, which is to minimize the square loss.

Fortunately, the model allows us to output this information. The shark library is very helpful to illustrate the applicability of the model

Squaredloss < > loss; // initialize square loss object
Data prediction = model( data.inputs ()); // input the forecast according to the data
cout << "squared loss: " << loss( data.labels (), prediction) < endl; // finally, we calculate the loss

First, we need to initialize a square loss object, and then we need to instantiate a data container. Then, the prediction is calculated according to the input of the system, and then we only need to calculate the output loss by passing the tag and the predicted value.

Finally, we need to compile. In the terminal, type the following command (make sure the directory is set up correctly):

g++ -o lr linear_regression.cpp -std=c++11 -lboost_serialization -lshark -lcblas

Once compiled, it creates an LR object. Now just run the program. The results are as follows:

b : [1](-0.749091)
A :[1,1]((2.00731))
Loss: 7.83109

The value of B is a little far from 0, but this is because of the noise in the tag. The multiplier value is close to 2, very similar to the data. This is how to use the shark Library in C + + to build linear regression model!

Mlpack C + + Library

Mlpack is a fast and flexible machine learning library written in C + +. Its goal is to provide fast and scalable implementation of machine learning algorithms. Mlpack can use these algorithms as simple command-line programs, or bind python, Julia, and C + +, and then integrate these classes into larger machine learning solutions.

We’ll first learn how to install mlpack and the environment. Then we will use mlpack to implement the k-means algorithm.

Install mlpack and installation environment (Linux)

Mlpack relies on the following libraries, which need to be installed on the system:

  • Armadillo >= 8.400.0 (with LAPACK support)
  • Boost (math_c99, program_options, serialization, unit_test_framework, heap, spirit) >= 1.49
  • ensmallen >= 2.10.0

In Ubuntu and Debian, you can get all of these dependencies through apt:

sudo apt-get install libboost-math-dev libboost-program-options-dev libboost-test-dev libboost-serialization-dev binutils-dev python-pandas python-numpy cython python-setuptools

Now that all the dependencies are installed on the system, you can directly run the following command to generate and install mlpack:

wget
tar -xvzpf mlpack-3.2.2.tar.gz
mkdir mlpack-3.2.2/build && cd mlpack-3.2.2/build
cmake ../
make -j4 # The -j is the number of cores you want to use for a build
sudo make install

On many linux systems, mlpack is installed as / usr / local / lib by default, and you may need to set LD_ LIBRARY_ Path environment variable:

export LD_LIBRARY_PATH=/usr/local/lib

The above instructions are the easiest way to get, build, and install mlpack.

Compiling program with mlpack
  • Set the relevant header file in your program (implement K-means)
#include 
#include
  • To compile, we need to link the following libraries:

std=c++11 -larmadillo -lmlpack -lboost_serialization

Implementation of K-means with mlpack

K-means is a centroid based algorithm, or a distance based algorithm, where we calculate the distance to assign a point to a cluster. In K-means, each cluster is associated with a centroid.

The main goal of K-means algorithm is to minimize the sum of distances between points and their respective cluster centroids.

K-means is an effective iterative process. We want to divide the data into specific clusters. First, we specify some initial centroids, so these centroids are completely random.

Next, for each data point, we find the nearest centroid. Then we assign the data points to that centroid. So each centroid represents a class. Once we assign all the data points to each centroid, we calculate the average of these centroids.

Here, we will use mlpack Library in C + + to implement k-means.

Initialization phase

We will first import the library and header functions of K-means:

#include 
#include 
#include 

Using namespace std;

Next, we will create some basic variables to set the number of clusters, the dimension of the program, the number of samples, and the maximum number of iterations we want to perform. Because K-means is an iterative process.

Int k = 2; // number of clusters
Int dim = 2; // dimension
int samples = 50; 
int max_ ITER = 10; // maximum number of iterations

Next, we’ll create the data. So this is our first useArmadilloLibrary. We will create a mapping class that is actually a data container:

arma::mat data(dim, samples, arma::fill::zeros);

This mat class, we give it 2-dimensional, 50 samples, it initializes all these data values to 0.

Next, we’ll assign some random data to this data class and run K-means effectively on it. I’m going to create 25 points around position 1 1, and we can effectively say that each data point is 1 1 or at x = 1, y = 1. Then we’re going to add some random noise to each of these 25 data points.

//Create data
    int i = 0;
    for(; i < samples / 2; ++i)
    {
        data.col(i) = arma::vec({1, 1}) + 0.25*arma::randn(dim);
    }
    for(; i < samples; ++i)
    {
        data.col(i) = arma::vec({2, 3}) + 0.25*arma::randn(dim);
    }

Here, for I from 0 to 25, the basic position is x = 1, y = 1, and then we add a certain number of random noises with dimension 2. Then we do the same thing for points x = 2, y = 3.

Our data is ready! It’s time to get into training.

Training stage

First, we instantiate an ARMA mat row type to save the cluster, and then instantiate an ARMA mat to save the centroid

//Cluster the data
arma::Row clusters;
arma::mat centroids;

Now, we need to instantiate the K-means class:

mlpack::kmeans::KMeans<> mlpack_kmeans(max_iter);

We instantiate the K-means class and specify the maximum number of iterations to pass to the constructor. Now we can do clustering.

We will call the cluster member function of the K-means class. We need to pass in data, the number of clusters, and then the cluster object and centroid object.

mlpack_kmeans.Cluster(data, k, clusters, centroids);

Now, the cluster function will run K-means on this data using the specified number of clusters

Generate results

We can use it centroids.print Function simply displays the result. This will give the location of the center of mass:

centroids.print("Centroids:");

Next, we need to compile. In the terminal, type the following command (make sure the directory is set correctly again):)

g++ k_means.cpp -o kmeans_test -O3 -std=c++11 -larmadillo -lmlpack -lboost_serialization && ./kmeans_test

Once compiled, it creates a kmeans object. Now just run the program. The results are as follows:

Centroids:
0.9497   1.9625
0.9689   3.0652

ending

In this article, we see two popular machine learning libraries that help us implement machine learning models in C + +.

Link to the original text: https://www.analyticsvidhya.com/blog/2020/05/introduction-machine-learning-libraries-c/

Welcome to visit pan Chuang AI blog station:
http://panchuang.net/

Sklearn machine learning Chinese official document:
http://sklearn123.com/

Welcome to pay attention to pan Chuang blog resource collection station:
http://docs.panchuang.net/