A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Time:2021-6-9

Link to the original text:http://tecdat.cn/?p=21444

Logistic regression is a common method in research, which can screen influencing factors, predict probability and classify. For example, the data obtained by high pass sequencing technology in medical research challenges the selection of high-dimensional variables. Penalized logistic regression can select variables and estimate coefficients for high-dimensional data, and its effective algorithm ensures the feasibility of calculation. Methods the commonly used penalty logistic algorithms such as lasso and ridge regression were introduced.

method

As we have seen before, the classical estimation technique for estimating the parameters of parametric models is to use the maximum likelihood method. More specifically,A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The objective function here only focuses on goodness of fit. But often, in econometrics, we believe that simple theories are preferable to more complex ones. So we want to punish models that are too complex.

That’s a good idea. This is often mentioned in Econometrics textbooks, but the choice of models usually does not involve reasoning. Usually, we use the maximum likelihood method to estimate the parameters, and then use AIC or BIC to compare the two models. Akaike (AIC) standard is based on

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

We have a measure of goodness of fit on the left, while on the right, the penalty increases with the “complexity” of the model.

Here, complexity is the number of variables used. But suppose we don’t make variable selection, we consider the regression of all covariates. definition

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

AIC can be written as

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

In fact, this is our objective function. More specifically, we will consider

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

In this article, I want to discuss the numerical algorithms for solving this optimization problem, for L1 (ridge regression) and L2 (lasso regression).

Standardization of covariates

Here we use the infarct data observed from patients in the emergency room, and we want to know who survived and get a predictive model. The first step is to consider all the covariates X_ The linear transformation of JXJ to normalize variables (with unit variance)

 for(j in 1:7) X[,j] = (X[,j]-mean(X[,j]))/sd(X[,j]) 

Ridge regression

Before running some code, think back to the problem we want to solve

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Considering the log likelihood of Gaussian variables, the sum of squares of the residuals is obtained, and the explicit solution is obtained. But not in the case of logistic regression.
The heuristic method of ridge regression is shown in the figure below. In the background, we can visualize the (two-dimensional) log likelihood of logistic regression. If we rewire the optimization problem as a constrained optimization problem, the blue circle is our constraintA case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Can be written equivalently (this is a strictly convex problem)

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Therefore, the maximum constrained value should be on the blue disk

b0=bbeta[1]
beta=bbeta[-1]
sum(-y*log(1 + exp(-(b0+X%*%beta))) - 
(1-y)*log(1 + exp(b0+X%*%beta)))}
u = seq(-4,4,length=251)
v = outer(u,u,function(x,y) LogLik(c(1,x,y)))


lines(u,sqrt(1-u^2),type="l",lwd=2,col="blue")
lines(u,-sqrt(1-u^2),type="l",lwd=2,col="blue")

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Let’s consider the objective function, the following code

 -sum(-y*log(1 + exp(-(b0+X%*%beta))) - (1-y)*
log(1 + exp(b0+X%*%beta)))+lambda*sum(beta^2) 

Why not try a standard optimizer? We mentioned that it’s not wise to use optimization routines because they are strongly dependent on starting points.

 beta_init = lm(y~.,)$coefficients

for(i in 1:1000){
vpar[i,] = optim(par = beta_init*rnorm(8,1,2), 
function(x) LogLik(x,lambda), method = "BFGS", control = list(abstol=1e-9))$par}
par(mfrow=c(1,2))
plot(density(vpar[,2]) 

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Obviously, even if we change the starting point, it seems that we are converging towards the same value. It can be considered the best.

It will then be used for calculation βλ Code for

 beta_init = lm(y~.,data )$coefficients
logistic_opt = optim(par = beta_init*0, function(x) LogLik(x,lambda), 
method = "BFGS", control=list(abstol=1e-9)) 

We can make it βλ The evolution of the model is visualized as λ Function of

 v_lambda = c(exp(seq(-2,5,length=61)))

plot(v_lambda,est_ridge[1,],col=colrs[1])
for(i in 2:7) lines(v_lambda,est_ridge[i,],

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

It seems to make sense: we can observe λ The contraction at the time of increase.

Ridge, using netwon Raphson algorithm

As we have seen, we can also use Newton Raphson to solve this problem. There is no penalty term, the algorithm is

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

among
A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

therefore

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Then there’s the code

 for(j in 1:7) X[,j] = (X[,j]-mean(X[,j]))/sd(X[,j])

for(s in 1:9){
pi = exp(X%*%beta[,s])/(1+exp(X%*%beta[,s]))

B = solve(t(X)%*%Delta%*%X+2*lambda*diag(ncol(X))) %*% (t(X)%*%Delta%*%z)
beta = cbind(beta,B)}
beta[,8:10]
[,1] [,2] [,3]
XInter 0.59619654 0.59619654 0.59619654
XFRCAR 0.09217848 0.09217848 0.09217848
XINCAR 0.77165707 0.77165707 0.77165707
XINSYS 0.69678521 0.69678521 0.69678521
XPRDIA -0.29575642 -0.29575642 -0.29575642
XPAPUL -0.23921101 -0.23921101 -0.23921101
XPVENT -0.33120792 -0.33120792 -0.33120792
XREPUL -0.84308972 -0.84308972 -0.84308972

Again, it seems to converge very fast.

Interestingly, through this algorithm, we can also get the variance of the estimator

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Then according to λ Function calculation βλ Code for

 for(s in 1:20){
pi = exp(X%*%beta[,s])/(1+exp(X%*%beta[,s]))
diag(Delta)=(pi*(1-pi))
z = X%*%beta[,s] + solve(Delta)%*%(Y-pi)
B = solve(t(X)%*%Delta%*%X+2*lambda*diag(ncol(X))) %*% (t(X)%*%Delta%*%z)
beta = cbind(beta,B)}
Varz = solve(Delta)
Varb = solve(t(X)%*%Delta%*%X+2*lambda*diag(ncol(X))) %*% t(X)%*% Delta %*% Varz %*%
Delta %*% X %*% solve(t(X)%*%Delta%*%X+2*lambda*diag(ncol(X))) 

We can visualize it βλ The evolution of (as λ (function of)

 plot(v_lambda,est_ridge[1,],col=colrs[1],type="l")
for(i in 2:7) lines(v_lambda,est_ridge[i,],col=colrs[i])

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The evolution of variance is obtained

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

In retrospect, when λ= 0 (on the left side of the graph), β 0= β MCO (no punishment). So, when λ The results show that (I) the bias increases (the estimate tends to zero) and (II) the variance decreases.

Using glmnetridge regression

As usual, there are R functions available for ridge regression. Let’s use the glmnet function, α= 0

 for(j in 1:7) X[,j] = (X[,j]-mean(X[,j]))/sd(X[,j])
glmnet(X, y, alpha=0)
plot(glm_ridge,xvar="lambda")

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

As the standard norm of L1,

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Ridge regression with orthogonal covariates when the covariates are orthogonal, an interesting example is obtained. This can be obtained by principal component analysis of covariates.

 get_pca_ind(pca)$coord

Let’s do ridge regression for these (orthogonal) covariates

 glm_ridge = glmnet(pca_X, y, alpha=0)
plot(glm_ridge,xvar="lambda",col=colrs,lwd=2)

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

We clearly observed the contraction of parameters, i.e

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

application

Let’s try the second set of data

We can try all kinds of things λ Value of

glmnet(cbind(df0$x1,df0$x2), df0$y==1, alpha=0)
plot(reg,xvar="lambda",col=c("blue","red"),lwd=2)
abline(v=log(.2)) 

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model


perhaps

 abline(v=log(1.2))
plot_lambda(1.2)

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The next step is to change the standard of punishment and use the L1 standard norm.

Standardization of covariates

As mentioned earlier, the first step is to consider all covariates X_ The linear transformation of JXJ makes variables standardized (with unit variance)

 for(j in 1:7) X[,j] = (X[,j]-mean(X[,j]))/sd(X[,j])
X = as.matrix(X)

Ridge regression

The heuristic method of lasso lasso regression is shown in the figure below. In the background, we can visualize the (two-dimensional) log likelihood of logistic regression, and the blue square is our constraint. If we reconsider the optimization problem as a constrained optimization problem,

LogLik = function(bbeta){

sum(-y*log(1 + exp(-(b0+X%*%beta))) - 
(1-y)*log(1 + exp(b0+X%*%beta)))}

contour(u,u,v,add=TRUE)
polygon(c(-1,0,1,0),c(0,1,0,-1),border="blue")

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The advantage here is that it can be used as a variable selection tool.

Heuristic, the mathematical explanation is as follows. Consider a simple regression equation y_ i=xi β+ε, have   L1 penalty and L2 loss function. The optimization problem becomes

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The first order condition can be written as

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The solution is

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Optimization process

Let’s start with standard (R) optimization routines, such as BFGS

 logistic_opt = optim(par = beta_init*0, function(x) PennegLogLik(x,lambda), 
hessian=TRUE, method = "BFGS", control=list(abstol=1e-9))


plot(v_lambda,est_lasso[1,],col=colrs[1],type="l")
for(i in 2:7) lines(v_lambda,est_lasso[i,],col=colrs[i],lwd=2)

 A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The results are volatile.

Using glmnet

For comparison, using the R program for Lasso, we get the following

 plot(glm_lasso,xvar="lambda",col=colrs,lwd=2)

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

If we look closely at the output, we can see that there is variable selection, in a sense, β j, λ= 0, which means “really zero.”.

,lambda=exp(-4))$beta
7x1 sparse Matrix of class "dgCMatrix"
s0
FRCAR . 
INCAR 0.11005070
INSYS 0.03231929
PRDIA . 
PAPUL . 
PVENT -0.03138089
REPUL -0.20962611

We can’t expect null values without optimization routines

 opt_lasso(.2)
FRCAR INCAR INSYS PRDIA
0.4810999782 0.0002813658 1.9117847987 -0.3873926427
PAPUL PVENT REPUL 
-0.0863050787 -0.4144139379 -1.3849264055

Orthogonal covariates

Before learning mathematics, please note that there are some very clear “variable” selection processes when covariates are orthogonal,

 pca = princomp(X)
pca_X = get_pca_ind(pca)$coord

plot(glm_lasso,xvar="lambda",col=colrs)
plot(glm_lasso,col=colrs)

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Standard Lasso

If we go back to the original lasso method, the goal is to solve the problem

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Note that intercept is not penalized. The first order condition is

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

that is

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

We can check whether bf0 contains at least subdifferential.

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

For items on the left

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

So the equation can be written

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Then we simplify them into a set of rules to examine our solutions

We can make it β J is decomposed into the sum of positive and negative parts β Replace J with β j+- β J -, where β j+, β j-≥0。 The lasso problem becomes

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

order α j+, α J − denotes β j+, β Lagrange multiplier of J −.

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

In order to satisfy the condition of stationarity, we take Lagrange’s theorem β J + and set it to zero

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

We are right β J − do the same to get

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

For convenience, the soft threshold function is introduced

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Pay attention to optimization

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

You can also write
A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Observed

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

This is a coordinate update.
Now, if we consider a (slightly) more general problem, there are weights in the first part

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

The coordinates are updated to
A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Back to our original question.

Lasso lasso logistic regression

The logic problem can be expressed as quadratic programming problem. Think back to logarithmic likelihood here

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

This is the concave function of the parameter. Therefore, we can use the quadratic approximation of log likelihood – Taylor expansion,

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Where Z_ Izi is

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

PI is prediction

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

In this way, we get a penalized least square problem. We can use the previous method

 beta0 = sum(y-X%*%beta)/(length(y))
beta0list[j+1] = beta0
betalist[[j+1]] = beta
obj[j] = (1/2)*(1/length(y))*norm(omega*(z - X%*%beta - 
beta0*rep(1,length(y))),'F')^2 + lambda*sum(abs(beta))

if (norm(rbind(beta0list[j],betalist[[j]]) - 
rbind(beta0,beta),'F') < tol) { break } 
} 
return(list(obj=obj[1:j],beta=beta,intercept=beta0)) }

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

It looks like the result of a call to glmnet, for something large enough λ, We do have free ingredients.

Application on the second data set

Now consider a second dataset with two covariates. The code to get lasso estimation is

 plot_l = function(lambda){
m = apply(df0,2,mean)
s = apply(df0,2,sd)
for(j in 1:2) df0[,j] &
reg = glmnet(cbind(df0$x1,df0$x2), df0$y==1, alpha=1,lambda=lambda)
u = seq(0,1,length=101)
p = function(x,y){

predict(reg,newx=cbind(x1=xt,x2=yt),type="response")}

image(u,u,v,col=clr10,breaks=(0:10)/10)
points(df$x1,df$x2,pch=c(1,19)[1+z],cex=1.5)
contour(u,u,v,levels = .5,add=TRUE)

Considering some small values of lambda, we only have some degree of parameter reduction

 plot(reg,xvar="lambda",col=c("blue","red"),lwd=2)
abline(v=exp(-2.8))
plot_l(exp(-2.8))

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

But use a larger one λ, Then there is variable selection β 1, λ= 0

A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model


A case study of R language penalized logistic regression (Lasso, ridge regression) high dimensional variable selection classification model

Most popular insights

1.Application of R language multivariate logistic regression

2.Implementation of panel smooth transition regression (PSTR) analysis case

3.Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.A case study of R language Poisson Poisson regression model

5.Hosmer lemeshow goodness of fit test in R language regression

6.The realization of lasso regression, ridge ridge regression and elastic net model in R language

7.Realization of logistic regression in R language

8.Predicting stock price with linear regression in Python

9.How does R language calculate IDI and NRI in survival analysis and Cox regression