Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

Time:2022-9-23

Original link:http://tecdat.cn/?p=23305 

In this post, I will show how to use R language to do support vector regression SVR.

We’ll do a simple linear regression first, then move to support vector regression so you can see how both perform on the same data.
 

a simple dataset

First, we will use this simple dataset.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

As you can see, there seems to be some relationship between our two variables, X and Y, and it looks like we can fit a straight line passing around each point.

Let’s do it in R!

Step 1: Do a Simple Linear Regression in R

Below is the same data in CSV format, I saved it in regression.csv file.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

We can now use R to display the data and fit a straight line.

# load data from csv file

DataDirectory <- "D:/" # Put your own folder here

data <- read.csv(paste(dataDirectory, 'data.csv', sep=""), header = TRUE)



# plot data

plot(data, pch=16)



# create a linear regression model

model <- lm(Y ~ X, data)



# add fitted line

abline(model)

The above code displays the following chart:

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

Step 2: How does our regression work?

To be able to compare linear regression and support vector regression, we first need a way to measure its performance.

To do this, let’s change the code to visualize every prediction the model makes

# make a prediction for each X

pred <- predict(model, data)



# show prediction results

points(X, pred)

The following graph was produced.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

For each data point Xi, the model makes a prediction Y^i, shown as a red cross on the graph. The only difference from the previous chart is that the dots are not connected to each other.

To measure how good our model is, we calculate how bad it is.

We can compare each Yi value with the associated predicted value Y^i to see how much they differ.

Note that the expression Y^i-Yi is the error, if we make a perfect prediction, Y^i will be equal to Yi and the error will be zero.

If we do this for each data point, and add the errors, we will get the sum of the errors, and if we take the average, we will get the mean squared error (MSE).

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

In machine learning, a common way to measure error is to use root mean square error (RMSE), so we’ll use that instead.

To calculate the RMSE, we take the square root of it and we get the RMSE

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

Using R, we can get the following code to calculate RMSE

rmse <- function(error)

{

  sqrt(mean(error^2))

}

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

We now know that the RMSE of our linear regression model is 5.70. Let’s try to improve it with SVR!

Step 3: Support Vector Regression

Create an SVR model in R.

Below is the code for prediction with support vector regression.

model <- svm(Y ~ X , data)

As you can see, it looks a lot like the code for linear regression. Note that we called the svm function (and not svr!) because this function can also be used for classification with support vector machines. If the function detects that the data is categorical (if the variable is a factor in R), it will automatically choose the SVM.

The code draws the diagram below.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

This time the prediction is closer to the true value! Let’s calculate the RMSE of the support vector regression model.

# This time svrModel$residuals is different from data$Y - predictedY.

# So we calculate the error like this


svrPredictionRMSE

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

As expected, the RMSE is better, now at 3.15 compared to 5.70 before.

But can we do better?

Step 4: Tune Your Support Vector Regression Model

To improve the performance of support vector regression, we will need to choose the best parameters for the model.

In our previous example, where we did ε-regression, we didn’t set any value for ε(ϵ), but it defaulted to 0.1. There is also a cost parameter that we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or model selection.

The standard way is to do a grid search. This means that we will train a large number of models for different combinations of ϵ and cost and choose the best one.

# Do a grid search

tuneResultranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))


# draw the parameter map

plot(Result)

There are two important points in the above code.

  • We train the model using the tune method, ϵ=0,0.1,0.2,…,1 and cost=22,23,24,…,29 which means it will train 88 models (this may take a long time time
  • tuneResult returns MSE, don’t forget to convert it to RMSE before comparing with our previous model.

The last line plots the results of the grid search.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

On this graph, we can see that the darker the area, the better our model is (as the RMSE is closer to zero in the dark area).

This means that we can try another grid search in a narrower range, we will try values ​​of ϵ between 0 and 0.2. For now, the cost value doesn’t seem to have an impact, so we’ll leave it as is and see if that changes.

rangelist(epsilo = seq(0,0.2,0.01), cost = 2^(2:9))

We trained different 168 models with this small piece of code.

When we zoom in on the dark area, we can see that there are several darker patches.

As can be seen from the figure, C is between 200 and 300, and ϵ between 0.08 and 0.09 has a small model error.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimizationHopefully for us, we don’t have to use our eyes to choose the best model, R makes it very easy for us to get it and use it to make predictions.

# This value may be different on your computer

# Because the parameter tuning method will randomly adjust the data

tunedModelRMSE <- rmse(error)

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

We have improved the RMSE of our support vector regression model again!

We can visualize both of our models. In the image below, the first SVR model is red, while the adjusted SVR model is blue.

Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

I hope you enjoyed this introduction to support vector regression with R. You can view the original text to get the source code of this tutorial.


Extension tecdat|R language for support vector machine regression SVR and grid search hyperparameter optimization

Most Popular Insights

1.R language multivariate logistic regression application case

2.Panel Smooth Transition Regression (PSTR) Analysis Case Implementation

3.Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR) in matlab

4.R language Poisson regression model analysis case

5.R language mixed effects logistic regression logistic model analysis of lung cancer

6.Implementation of LASSO regression, Ridge regression and Elastic Net model in r language

7.R language logistic regression, Naive Bayes Bayes, decision tree, random forest algorithm to predict heart disease

8.python use linear regression to predict stock prices

9.R language uses logistic regression, decision trees and random forests to make classification predictions on credit datasets