Using non convex penalty function regression (SCAD, MCP) to analyze prostate data in R language

Time:2021-3-12

Link to the original text:http://tecdat.cn/?p=20828 

In this paper, Lasso or nonconvex penalty is used to fit the regularization of linear regression, GLM and Cox regression models_ Minimum_ Most_ Fovea magna_ Degree_ Punishment_ Function_ (MCP)_ And smooth slice absolute deviation penalty (SCAD), as well as other L2 penalty options (“elastic network”). Utilities for performing cross validation and post fitting visualization, summarization, inference and prediction are also provided.

We studied prostate data with 8 variables and a continuous dependent variable, PSA level of men undergoing radical prostatectomy (on logarithmic scale): PSA level of men undergoing radical prostatectomy (on logarithmic scale): PSA level of men undergoing radical prostatectomy (on logarithmic scale): PSA level of men undergoing radical prostatectomy (on logarithmic scale): PSA level of men undergoing radical prostatectomy

 X <- data$X
y <- data$y 

To fit a penalty regression model to this data, do the following:

reg(X, y) 

The default penalty here is_ Minimum_ Most_ Fovea magna_ Degree_ Punishment_ Function_ (MCP)_ But you can also use SCAD and lasso penalties. This will produce a coefficient path that we can plot

 plot(fit) 

Using non convex penalty function regression (SCAD, MCP) to analyze prostate data in R language

Note that variables are entered one model at a time, and several coefficients are zero at any given value of λ. To see what the coefficient is, we can use the followingcoefFunction:

 coef(fit, lambda=0.05)
# (Intercept)      lcavol     lweight         age        lbph         svi 
#  0.35121089  0.53178994  0.60389694 -0.01530917  0.08874563  0.67256096 
#         lcp     gleason       pgg45 
#  0.00000000  0.00000000  0.00168038 

ThesummaryThe method can be used in post-processing_ Selection inference_ :

 summary(fit 
# MCP-penalized linear regression with n=97, p=8
# At lambda=0.0500:
# -------------------------------------------------
#   Nonzero coefficients         :   6
#   Expected nonzero coefficients:   2.54
#   Average mfdr (6 features)    :   0.424
# 
#         Estimate      z     mfdr Selected
# lcavol   0.53179  8.880  < 1e-04        *
# svi      0.67256  3.945 0.010189        *
# lweight  0.60390  3.666 0.027894        *
# lbph     0.08875  1.928 0.773014        *
# age     -0.01531 -1.788 0.815269        *
# pgg45    0.00168  1.160 0.917570        * 

In this case, even after adjusting other variables in the model,lcavol, sviAndlweightObviously associated with the dependent variable, andlbph, ageAndpgg45Maybe it’s just_ By chance_ include. Generally, in order to evaluate the prediction accuracy of the model under various values of λ, cross validation will be performed:

 plot(cvfit) 

Using non convex penalty function regression (SCAD, MCP) to analyze prostate data in R language

The value of λ which minimizes the cross validation error is determined bycvfit$lambda.minIn this case, it is 0.017. takecoefIn the output of return, apply tocv.ncvregCoefficient of the value of λ:

 coef 
#  (Intercept)       lcavol      lweight          age         lbph          svi 
#  0.494154801  0.569546027  0.614419811 -0.020913467  0.097352536  0.752397339 
#          lcp      gleason        pgg45 
# -0.104959403  0.000000000  0.005324465 

Can passpredictTo get the predicted value, there are several options:

predict(cvfit
#Predicting the response of new observations
#         1         2         3         4         5         6 
# 0.8304040 0.7650906 0.4262072 0.6230117 1.7449492 0.8449595
 
#The number of nonzero coefficients
# 0.01695 
#       7
 
#Characteristics of nonzero coefficients
#  lcavol lweight     age    lbph     svi     lcp   pgg45 
#       1       2       3       4       5       6       8

Note that the result of the original fit (to the complete dataset) iscvfit$fit; you don’t have to call both at the same timencvregAndcv.ncvregAnalyze the data set.

For exampleplot(cvfit$fit)The same coefficient path graph as above will be generatedplot(fit) 。


Using non convex penalty function regression (SCAD, MCP) to analyze prostate data in R language

Most popular insights

1.Application of R language multivariate logistic regression

2.Implementation of panel smooth transition regression (PSTR) analysis case

3.Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.A case study of R language Poisson Poisson regression model

5.Hosmer lemeshow goodness of fit test in R language regression

6.The realization of lasso regression, ridge ridge regression and elastic net model in R language

7.Realization of logistic regression in R language

8.Predicting stock price with linear regression in Python

9.How does R language calculate IDI and NRI in survival analysis and Cox regression