## Link to the original text:http://tecdat.cn/?p=9706

**Overview**

Here, we relax the assumption of the popular linear method. Sometimes the linear assumption is just a poor approximation. There are many ways to solve this problem, some of which can be solved by using regularization method to reduce the complexity of the model. However, these techniques still use linear models and can only be improved so far. This paper focuses on the extension of linear model

- _ Polynomial regression_ This is a simple method to provide nonlinear fitting for data.
- _ Step function_ Divide the range of variables into_ K_ Different regions to generate qualitative variables. It has the effect of fitting piecewise constant function.
- _ Regression spline_ It is more flexible than polynomial and step function, and is actually an extension of them.
- _ Local spline curve_ Similar to the regression spline curve, but allows overlapping regions, and can overlap smoothly.
- _ Smooth spline curve_ They are also similar to regression splines, but they minimize the residual sum of squares criterion of smoothness penalty.
- _ Generalized additive model_ The above method is allowed to be extended to handle multiple predictive variables.

**polynomial regression **

This is the most traditional way to extend the linear model. As we increase the polynomial term, polynomial regression enables us to generate nonlinear curves while still using the least squares method to estimate the coefficients.

## stepwise regression

It is often used in biostatistics and epidemiology.

**Regression spline**

Regression splines are many applications of extended polynomials and stepwise regression techniques_ Basic_ One of the functions. in fact. Polynomials and stepwise regression functions are just_ Base_ Function.

This is an example of piecewise cubic fitting (top left).

In order to solve this problem, a better solution is to use constraints, so that the fitting curve must be continuous.

**Choose the location and number of knots**

One option is to place more knots where we think the change is fastest and fewer knots where it is more stable. But in practice, knots are usually placed in a uniform way.

It should be clear that in this case, there are actually five knots, including boundary knots.

So how many knots should we use? A simple choice is to try many knots and see which produces the best curve. However, a more objective approach is to use cross validation.

Compared with polynomial regression, spline curve can show more stable effect.

**Smooth spline**

We discuss regression splines, which are created by specifying a set of knots, generating a series of basis functions, and then estimating the spline coefficients using the least square method. Smoothing splines is another way to create splines. Let’s recall that our goal is to find some functions that are very suitable for the observed data, that is, to minimize RSS. However, if there are no restrictions on our functions, we can set RSS to zero by choosing the function that precisely interpolates all the data.

**Select the smoothing parameter lambda**

Again, we turn to cross validation. It turns out that we can actually compute loocv very efficiently to smooth splines, regression splines and any other basis functions.

Smooth splines are generally preferable to regression splines because they usually create simpler models and have comparable fit.

**Local regression**

Local regression involves using only nearby training observations to calculate target points_ x_ 0.

Local regression can be performed in a variety of ways, especially when it comes to fitting_ Linear regression model is especially obvious in the multivariate scheme Therefore, some variables can be fitted globally, while others can be fitted locally.

**Generalized additive model**

GAM model provides a general framework to extend the linear model by allowing nonlinear functions of each variable while maintaining additivity.

GAM with smooth splines is not so simple because least squares cannot be used. Instead, we use a method called_ Inverse fitting_ It’s the best way.

**Advantages and disadvantages of Gam**

**advantage**

- GAM allows nonlinear functions to be fitted to each predictor so that we can automatically model the nonlinear relationships that standard linear regression will miss. We don’t have to try many different transformations for each variable.
- Nonlinear fitting can be potentially applied to the dependent variable_ Y_ Make more accurate predictions.
- Because the model is additive, we can still examine each pair of predictors_ Y_ While keeping other variables unchanged.

**shortcoming**

- The main limitation is that the model is limited to the cumulative model, so important interactions may be missed.

**example**

**Polynomial regression and step function**

```
library(ISLR)
attach(Wage)
```

We can easily use it to fit polynomial functions, and then specify the variables and degree of the polynomial. This function returns the matrix of orthogonal polynomials, which means that each column is a linear combination of variables`age`

， `age^2`

， `age^3`

, and`age^4`

. If you want to get the variable directly, you can specify`raw=TRUE`

But this will not affect the forecast results. It can be used to check the required coefficient estimates.

```
fit = lm(wage~poly(age, 4), data=Wage)
kable(coef(summary(fit)))
```

Now let’s create one`ages`

The vector we want to predict. Finally, we are going to plot the data and fit the polynomial of degree 4.

```
ageLims <- range(age)
age.grid <- seq(from=ageLims[1], to=ageLims[2])
pred <- predict(fit, newdata = list(age = age.grid),
se=TRUE)
```

```
plot(age,wage,xlim=ageLims ,cex=.5,col="darkgrey")
lines(age.grid,pred$fit,lwd=2,col="blue")
matlines(age.grid,se.bands,lwd=2,col="blue",lty=3)
```

In this simple example, we can use ANOVA test.

```
## Analysis of Variance Table
##
## Model 1: wage ~ age
## Model 2: wage ~ poly(age, 2)
## Model 3: wage ~ poly(age, 3)
## Model 4: wage ~ poly(age, 4)
## Model 5: wage ~ poly(age, 5)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2998 5022216
## 2 2997 4793430 1 228786 143.59 <2e-16 ***
## 3 2996 4777674 1 15756 9.89 0.0017 **
## 4 2995 4771604 1 6070 3.81 0.0510 .
## 5 2994 4770322 1 1283 0.80 0.3697
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We see,`_M_1`

Compared with the quadratic model, P value is higher`_M_2`

It is essentially zero, which indicates that linear fitting is not enough. Therefore, we can conclude that quadratic or cubic models may be more suitable for this data, and tend to simple models.

We can also use cross validation to select polynomial degree.

In fact, the minimum cross validation error we see here is for quartic polynomials, but choosing the cubic or quadratic model will not cause too much loss. Next, we consider predicting whether an individual’s annual income exceeds 250000.

However, the confidence interval of probability is unreasonable, because we finally get some negative probability. In order to generate confidence intervals, it is more meaningful to transform pairs _ Number_ forecast.

draw:

```
plot(age,I(wage>250),xlim=ageLims ,type="n",ylim=c(0,.2))
lines(age.grid,pfit,lwd=2, col="blue")
matlines(age.grid,se.bands,lwd=1,col="blue",lty=3)
```

**Stepwise regression function**

Here, we need to split the data.

`table(cut(age, 4)) `

```
##
## (17.9,33.5] (33.5,49] (49,64.5] (64.5,80.1]
## 750 1399 779 72
```

```
fit <- lm(wage~cut(age, 4), data=Wage)
coef(summary(fit))
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 94.158 1.476 63.790 0.000e+00
## cut(age, 4)(33.5,49] 24.053 1.829 13.148 1.982e-38
## cut(age, 4)(49,64.5] 23.665 2.068 11.443 1.041e-29
## cut(age, 4)(64.5,80.1] 7.641 4.987 1.532 1.256e-01
```

`splines`

Spline function

Here, we will use cubic splines.

Because we use the cubic spline of three knots, the generated spline has six basis functions.

```
## [1] 3000 6
dim(bs(age, df=6))
## [1] 3000 6
## 25% 50% 75%
## 33.75 42.00 51.00
```

Fit the spline curve.

We can also fit smooth splines. Here, we fit the spline curve with 16 degrees of freedom, and then select the spline curve through cross validation to generate 6.8 degrees of freedom.

```
fit2$df
## [1] 6.795
lines(fit, col='red', lwd=2)
lines(fit2, col='blue', lwd=1)
legend('topright', legend=c('16 DF', '6.8 DF'),
col=c('red','blue'), lty=1, lwd=2, cex=0.8)
```

**Local regression**

Local regression was performed.

## GAMs

Now, we use gam to predict wages by spline of year, age and education. Since this is only a linear regression model with multiple basic functions, we only use the`lm()`

Function.

In order to fit more complex splines, we need to use smooth splines.

Draw these two models

`year`

It’s linear. We can create a new model and then use ANOVA test.

```
## Analysis of Variance Table
##
## Model 1: wage ~ ns(age, 5) + education
## Model 2: wage ~ year + s(age, 5) + education
## Model 3: wage ~ s(year, 4) + s(age, 5) + education
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2990 3712881
## 2 2989 3693842 1 19040 15.4 8.9e-05 ***
## 3 2986 3689770 3 4071 1.1 0.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Seems to add linearity`year`

The composition is better than that without linear addition The gam of the ingredients is much better.

```
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -119.43 -19.70 -3.33 14.17 213.48
##
## (Dispersion Parameter for gaussian family taken to be 1236)
##
## Null Deviance: 5222086 on 2999 degrees of freedom
## Residual Deviance: 3689770 on 2986 degrees of freedom
## AIC: 29888
##
## Number of Local Scoring Iterations: 2
##
## Anova for Parametric Effects
## Df Sum Sq Mean Sq F value Pr(>F)
## s(year, 4) 1 27162 27162 22 2.9e-06 ***
## s(age, 5) 1 195338 195338 158 < 2e-16 ***
## education 4 1069726 267432 216 < 2e-16 ***
## Residuals 2986 3689770 1236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Anova for Nonparametric Effects
## Npar Df Npar F Pr(F)
## (Intercept)
## s(year, 4) 3 1.1 0.35
## s(age, 5) 4 32.4 <2e-16 ***
## education
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

In the model with nonlinear relation, we can confirm again`year`

No contribution to the model.

Next, we will use local regression to fit gam.

Before calling GAM, we can also use local regression to create interaction items.

We can plot the resulting surface.

reference

1.Application of R language multivariate logistic regression

2.Implementation of panel smooth transition regression (PSTR) analysis case

3.Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.A case study of R language Poisson Poisson regression model

5.Hosmer lemeshow goodness of fit test in R language regression

6.The realization of lasso regression, ridge ridge regression and elastic net model in R language

7.Realization of logistic regression in R language

8.Predicting stock price with linear regression in Python

9.How does R language calculate IDI and NRI in survival analysis and Cox regression