If you know the field of data science, you may have heard of lasso. Lasso is a model that penalizes the size of parameters in the objective function, trying to exclude irrelevant variables from the model. It has two very natural uses. The first is variable selection and the second is prediction. Because usually, Lasso selects much fewer variables than ordinary least squares (OLS), and its prediction variance will be much smaller, at the cost of a small amount of deviation in the sample.
One of the most important features of Lasso is that it can handle many more variables than observations. I’m talking about thousands of variables. This is one of the main reasons for its recent popularity.
In this example, I use the most popular Lasso, glmnet. We can estimate lasso very quickly and use cross validation to select the best model. According to my experience, in the context of time series, it is better to use information criteria (such as BIC) to select the best model. It is faster and avoids some complex problems of cross validation in time series.
This paper estimates lasso and uses information criteria to select the best model. We will use lasso to forecast inflation.
## == The data is decomposed into in sample and out of sample y.in=y\[1:100\]; y.out=y\[-c(1:100)\] x.in=x\[1:100,\]; x.out=x\[-c(1:100),\] ## == LASSO == ## glmnet(x.in,y.in,crit = "bic")
The first figure above shows that when we increase the penalty in the lasso objective function, the variable will return to zero. The second figure shows the BIC curve and the selected model. Now we can calculate the forecast.
## == forecast == ## predict(lasso,x.out)
Lasso has an adaptive version with some better features in variable selection. Note that this does not always mean better predictions. The idea behind the model is to use some previously known information to select variables more effectively. Generally speaking, this information is the coefficients estimated by lasso or some other models.
## = adaLASSO = ## adalasso(x.in,y.in,crit="bic",penalty=factor) predict(adalasso, x.out)
## = Comparison error = ## sqrt(mean((y.out-pred.ada)^2)
In this case, adalasso produced a more accurate prediction. Generally speaking, adalasso has better prediction effect than simple lasso. However, this is not an absolute fact. I have seen many cases where simple lasso has done better.
 Bühlmann, Peter, and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
 Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for
Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/
 Marcio Garcia, Marcelo C. Medeiros , Gabriel F. R. Vasconcelos (2017). Real-time inflation forecasting with high-dimensional models: The case of Brazil. Internationnal Journal of Forecasting, in press.
Most popular insights