Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

Time:2022-8-18

Original link:http://tecdat.cn/?p=23509 

We use generalized additive models (GAMs) in our research work. The mgcv package is an excellent suite of software for specifying, fitting and visualizing GAMs for very large datasets.

This post describes what is currently possible with Generalized Additive Models (GAMs).

We need to load mgcv

library('mgcv')

Popular example datasets

The data in dat is well studied in GAM-related studies, and contains a number of covariates — labeled x0 to x3 — that have a non-linear relationship to the dependent variable to varying degrees.

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

We want to try to fit these relationships by using splines to approximate the true relationship between covariates and dependent variables. To fit an additive model, we use

 gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), dat,  "REML")

mgcv provides a summary() method to extract information about the fitted GAM.

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

check() function to check whether each smooth_function_ in the model uses a sufficient number of basis functions. You may not use check() directly – additional diagnostics will be output, and four model diagnostics plots will be produced.

draw smooth_function_graph

To visualize estimated GAMs, mgcv provides the plot.gam() method and the vis.gam() function to produce ggplot2-like plots from objects. To visualize the smooth _function_ of the four estimates in the GAM model, we will use

plot(mod)

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

The result is to plot every smooth _function_ in mod GAM.

Use the plot function to draw multiple panels on a drawing device and line up the individual plots.

extract smooth_function_data

The underlying smooth _function_ used to process the representation in the mod, if you want to extract most of the data used to build this graph, you can use the smooth() function.

smooth(mod, "x1")

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

Diagnostic map

Diagnostic plot produced by check()

check(mod)

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

The result is an array of four diagnostic plots, including a QQ plot (top left) and histogram (bottom left) of model residuals, a plot of residuals versus linear predictor (top right), and a plot of observed versus fitted values.

Each of these four graphs is generated through a user-accessible function that implements a specific graph. For example, qqplot(mod) produces the QQ plot at the top left of the above figure.

qqplot(mod)

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

The result of qqplot(mod) is a QQ plot of residuals with reference magnitudes obtained by simulating the data from the fitted model.

Also handles many of the more specializedsmooth_function_. For example, two-dimensionalsmooth_function_.

plot(mod)

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

The default way of plotting a 2D smooth _function_ is to use plot().

The and factor smooth_function_interaction terms, equivalent to random slopes and intercepts of smooth curves, are plotted on a panel, and colors are used to distinguish different random smooth_function_.

## simulated data
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x, a=2, b=-1) exp(a * x)+b
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10
f <- f0(x0) + f1(x1, a\[fac\], b\[fac\]) + f2(x2)
fac <- factor(fac)
y <- f + rnorm(n) * 2


plot(mod)

Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

Results of a more complex GAM with factor-smooth_function_interaction terms, bs = 'fs'.

What else can be done?

Can handle most smooth_function_ that mgcv can estimate, including by-variable smooth_function_ with factors and continuous secondary variables, random effects smooth_function_ (bs = 're'), 2D tensor product smooth_ function_, and a model with parameter terms.

references

Augustin, N. H., Sauleau, E.-A., and Wood, S. N. (2012). On quantile quantile plots for generalized linear models. _Computational statistics & data analysis_ 56, 2404–2409. doi:10.1016/j.csda.2012.01.026.


Extension tecdat|R Generalized Additive (Additive) Models (GAMs) and Visualization of Smooth Functions

Most Popular Insights

1.R language multivariate logistic regression application case

2.Panel Smooth Transition Regression (PSTR) Analysis Case Implementation

3.Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR) in matlab

4.R language Poisson regression model analysis case

5.R language mixed effects logistic regression logistic model analysis of lung cancer

6.Implementation of LASSO regression, Ridge regression and Elastic Net model in r language

7.R language logistic regression, Naive Bayes Bayes, decision tree, random forest algorithm to predict heart disease

8.python use linear regression to predict stock prices

9.R language uses logistic regression, decision trees and random forests to make classification predictions on credit datasets