We use generalized additive models (GAMs) in our research work. The mgcv package is an excellent suite of software for specifying, fitting and visualizing GAMs for very large datasets.
We need to load mgcv
Popular example datasets
The data in dat is well studied in GAM-related studies, and contains a number of covariates — labeled x0 to x3 — that have a non-linear relationship to the dependent variable to varying degrees.
We want to try to fit these relationships by using splines to approximate the true relationship between covariates and dependent variables. To fit an additive model, we use
gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), dat, "REML")
mgcv provides a summary() method to extract information about the fitted GAM.
check() function to check whether each smooth_function_ in the model uses a sufficient number of basis functions. You may not use check() directly – additional diagnostics will be output, and four model diagnostics plots will be produced.
To visualize estimated GAMs, mgcv provides the plot.gam() method and the vis.gam() function to produce ggplot2-like plots from objects. To visualize the smooth _function_ of the four estimates in the GAM model, we will use
The result is to plot every smooth _function_ in mod GAM.
Use the plot function to draw multiple panels on a drawing device and line up the individual plots.
The underlying smooth _function_ used to process the representation in the mod, if you want to extract most of the data used to build this graph, you can use the smooth() function.
Diagnostic plot produced by check()
The result is an array of four diagnostic plots, including a QQ plot (top left) and histogram (bottom left) of model residuals, a plot of residuals versus linear predictor (top right), and a plot of observed versus fitted values.
Each of these four graphs is generated through a user-accessible function that implements a specific graph. For example, qqplot(mod) produces the QQ plot at the top left of the above figure.
The result of qqplot(mod) is a QQ plot of residuals with reference magnitudes obtained by simulating the data from the fitted model.
Also handles many of the more specializedsmooth_function_
. For example, two-dimensionalsmooth_function_.
The default way of plotting a 2D smooth _function_ is to use plot().
The and factor smooth_function_interaction terms, equivalent to random slopes and intercepts of smooth curves, are plotted on a panel, and colors are used to distinguish different random smooth_function_.
## simulated data f0 <- function(x) 2 * sin(pi * x) f1 <- function(x, a=2, b=-1) exp(a * x)+b f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10 f <- f0(x0) + f1(x1, a\[fac\], b\[fac\]) + f2(x2) fac <- factor(fac) y <- f + rnorm(n) * 2 plot(mod)
Results of a more complex GAM with factor-smooth_function_interaction terms, bs = 'fs'.
What else can be done?
Can handle most smooth_function_ that mgcv can estimate, including by-variable smooth_function_ with factors and continuous secondary variables, random effects smooth_function_ (bs = 're'), 2D tensor product smooth_ function_, and a model with parameter terms.
Augustin, N. H., Sauleau, E.-A., and Wood, S. N. (2012). On quantile quantile plots for generalized linear models. _Computational statistics & data analysis_ 56, 2404–2409. doi:10.1016/j.csda.2012.01.026.
Most Popular Insights