## Original link:http://tecdat.cn/?p=23305

In this post, I will show how to use R language to do support vector regression SVR.

We’ll do a simple linear regression first, then move to support vector regression so you can see how both perform on the same data.

## a simple dataset

First, we will use this simple dataset.

As you can see, there seems to be some relationship between our two variables, X and Y, and it looks like we can fit a straight line passing around each point.

Let’s do it in R!

## Step 1: Do a Simple Linear Regression in R

Below is the same data in CSV format, I saved it in regression.csv file.

We can now use R to display the data and fit a straight line.

```
# load data from csv file
DataDirectory <- "D:/" # Put your own folder here
data <- read.csv(paste(dataDirectory, 'data.csv', sep=""), header = TRUE)
# plot data
plot(data, pch=16)
# create a linear regression model
model <- lm(Y ~ X, data)
# add fitted line
abline(model)
```

The above code displays the following chart:

## Step 2: How does our regression work?

To be able to compare linear regression and support vector regression, we first need a way to measure its performance.

To do this, let’s change the code to visualize every prediction the model makes

```
# make a prediction for each X
pred <- predict(model, data)
# show prediction results
points(X, pred)
```

The following graph was produced.

For each data point Xi, the model makes a prediction Y^i, shown as a red cross on the graph. The only difference from the previous chart is that the dots are not connected to each other.

To measure how good our model is, we calculate how bad it is.

We can compare each Yi value with the associated predicted value Y^i to see how much they differ.

Note that the expression Y^i-Yi is the error, if we make a perfect prediction, Y^i will be equal to Yi and the error will be zero.

If we do this for each data point, and add the errors, we will get the sum of the errors, and if we take the average, we will get the mean squared error (MSE).

In machine learning, a common way to measure error is to use root mean square error (RMSE), so we’ll use that instead.

To calculate the RMSE, we take the square root of it and we get the RMSE

Using R, we can get the following code to calculate RMSE

```
rmse <- function(error)
{
sqrt(mean(error^2))
}
```

We now know that the RMSE of our linear regression model is 5.70. Let’s try to improve it with SVR!

## Step 3: Support Vector Regression

Create an SVR model in R.

Below is the code for prediction with support vector regression.

`model <- svm(Y ~ X , data)`

As you can see, it looks a lot like the code for linear regression. Note that we called the svm function (and not svr!) because this function can also be used for classification with support vector machines. If the function detects that the data is categorical (if the variable is a factor in R), it will automatically choose the SVM.

The code draws the diagram below.

This time the prediction is closer to the true value! Let’s calculate the RMSE of the support vector regression model.

```
# This time svrModel$residuals is different from data$Y - predictedY.
# So we calculate the error like this
svrPredictionRMSE
```

As expected, the RMSE is better, now at 3.15 compared to 5.70 before.

But can we do better?

## Step 4: Tune Your Support Vector Regression Model

To improve the performance of support vector regression, we will need to choose the best parameters for the model.

In our previous example, where we did ε-regression, we didn’t set any value for ε(ϵ), but it defaulted to 0.1. There is also a cost parameter that we can change to avoid overfitting.

The process of choosing these parameters is called hyperparameter optimization, or model selection.

The standard way is to do a grid search. This means that we will train a large number of models for different combinations of ϵ and cost and choose the best one.

```
# Do a grid search
tuneResultranges = list(epsilon = seq(0,1,0.1), cost = 2^(2:9))
# draw the parameter map
plot(Result)
```

There are two important points in the above code.

- We train the model using the tune method, ϵ=0,0.1,0.2,…,1 and cost=22,23,24,…,29 which means it will train 88 models (this may take a long time time
- tuneResult returns MSE, don’t forget to convert it to RMSE before comparing with our previous model.

The last line plots the results of the grid search.

On this graph, we can see that the darker the area, the better our model is (as the RMSE is closer to zero in the dark area).

This means that we can try another grid search in a narrower range, we will try values of ϵ between 0 and 0.2. For now, the cost value doesn’t seem to have an impact, so we’ll leave it as is and see if that changes.

`rangelist(epsilo = seq(0,0.2,0.01), cost = 2^(2:9))`

We trained different 168 models with this small piece of code.

When we zoom in on the dark area, we can see that there are several darker patches.

As can be seen from the figure, C is between 200 and 300, and ϵ between 0.08 and 0.09 has a small model error.

Hopefully for us, we don’t have to use our eyes to choose the best model, R makes it very easy for us to get it and use it to make predictions.

```
# This value may be different on your computer
# Because the parameter tuning method will randomly adjust the data
tunedModelRMSE <- rmse(error)
```

We have improved the RMSE of our support vector regression model again!

We can visualize both of our models. In the image below, the first SVR model is red, while the adjusted SVR model is blue.

I hope you enjoyed this introduction to support vector regression with R. You can view the original text to get the source code of this tutorial.

Most Popular Insights

1.R language multivariate logistic regression application case

2.Panel Smooth Transition Regression (PSTR) Analysis Case Implementation

3.Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR) in matlab

4.R language Poisson regression model analysis case

5.**R language mixed effects logistic regression logistic model analysis of lung cancer**

6.Implementation of LASSO regression, Ridge regression and Elastic Net model in r language