This article first shows how to import data into R. Then, the correlation matrix is generated, and then the regression analysis of two predictive variables is carried out. Finally, it shows how to output the matrix as an external file and use it for regression.
Data entry and cleaning
First, we will load the required packages.
Library (dplyr) # used to clean up data Significance of Library (hmisc) # correlation coefficient
Then, we will use FORTRAN to read in the data file and clean up the data file slightly.
#Make sure your working directory is set to the location of the file #In, for example, setwd ('d:/ download) you can go to #Session menu - 'set working directory' - to source file #Select a subset of the data for analysis and store it in the new #Data frame sub <- subset(des,case < 21 & case != 9)# != Means not equal to #Let's look at the data file Sub # note that R treats blank cells in the original data as missing and marks these as Na. Na is the default
#Subset specific tests using dplyr select(sub, c(T1, T2, T4)) #Use the psych package to get the description
Note that R treats blank cells in the original data as missing and marks these as Na. Na is the default missing data tag implemented by R.
Create and export correlation matrix
Now, we will create a correlation matrix and show you how to export the correlation matrix to an external file. Note that the first correlation matrix created uses the option “pairwise”, which performs paired deletion of missing data. This is usually undesirable because it removes variables rather than the entire case, which may bias parameter estimates. The second option, “complete”, implements list deletion for missing data, which is preferable to paired deletion, because the deviation of parameter estimation is small (delete the whole case, not just specific variables).
#Create a correlation matrix between variables cor <- cor( "pairwise.complete.obs", Cor # correlation matrix
Significance of rcorr (test) # correlation
#Save the correlation matrix to a file write.csv( cor, "PW.csv")
cor(test, method = "pear") Cor # note the difference when we use list deletion
#Save the correlation matrix to a file on your hard disk write.csv(cor, "cor.csv")
Now, we will do some multiple regression. Specifically, we will see whether tests 1 and 2 predict test 4. We will also examine some model assumptions, including whether there are outliers and whether there is multicollinearity (variance inflation factor or VIF) between tests. Some of these codes help you save residuals, predictors, and other case diagnostics to data frames for later inspection. Note that the LM command defaults to delete by list.
#Save fitted and predicted values to the data frame Predicted #Save case diagnosis (outliers) hatvalues(model) #Multicollinearity test vif(model)
Vcov (OL) # save the variance covariance matrix of the coefficient CoV (gdest) # save the covariance matrix of the original data
Model results and their meanings:
- Multiple r squareTells you the variance ratio of the dependent variable predicted or explained in a given linear combination of independent variables in the model.
- Adjusted R squareTell you the estimated value of the square value of the overall level R.
- Residual standard errorTell youresidualThe mean standard deviation (original measurement) of. If the square is the mean square error (MSE), it is included in the ANOVA table next to the residual.
- Significance term after F statisticProvides a comprehensive test for intercept only models without predictive variables (does your model predict your results better than the average only?)
- Analysis of variance table mean sqVariance of residuals
- Variance expansion factorTell you whether there is multicollinearity between the predictive variables in the model. Generally, a number greater than 10 indicates a problem. The lower the better.
- Impact measurementMany case diagnoses were provided. In this output, the corresponding column numbers are expressed in their respective order: dfbeta of intercept, dfbeta of x1, dfbeta of X2, dffits (global impact, or how much yhat (predicted y) has changed based on the deletion of the case), covariance ratio (change in the determinant of the estimated covariance matrix by deleting this observation), cook’s distance (impact), Leverage ratio (how unusual is the observed value in terms of the value of independent predictive variables?), The significance test marks the case as a potential outlier. Note that one way to find outliers is to find residuals that exceed the mean by more than 2 standard deviations (the mean is always 0).
Next, let’s draw some model diagrams.
#Chart the model plot(T4 ~ T1, data =test)
The green line indicates linear best fit, while the red line indicates loess (locally weighted regression)_ Fitting. The red dotted line indicates loess (locally weighted regression)_ +-1 standard error of smooth fitting line. The additional parameters of the first scatter command mark each data point to help identify outliers. Notice the second figure. If the residual is normally distributed, we will have a flat line instead of a curve.
Use multivariate regression to show how coefficients are functions of residuals
Now, let’s see how the coefficient is a function of the residual. We will construct the coefficient of T1 from the previous regression. First, we will create the residual of T4 (standard) to control the predictive variables other than T1.
Residuals (mot4) # save residuals in the original data frame
Next, we create residuals for T1 (predictive variables) to control predictive variables other than T1. We regress T1 on T2 and get y=b0+b1t2, where y is T1. Residuals are everything that has nothing to do with T2.
Now we use T4 to run regression, delete all T2 as DV, and T1 delete all T2 as independent variables.
Summary (MODF) # model results
Note that this regression coefficient is the same as that in the previous two predictor regression. Next, we will run another regression with the case of DV. We will create a new chart to show that leverage depends only on predictors rather than dependent variables.
plot(lev ~ cae, data = grb)
Please note that in SEM, there is no simple distance or leverage measure, but we can get the leverage, because it is separate from DV. If we can find an extreme case, we will analyze it with and without this case to determine its impact. The change in output will be a test of leverage.
Now we make a 3D scatter diagram of the relationship between the tests.
plot(T1,T2, T4, 3D (model) # use our previous model to draw a regression plane
Multiple regression using correlation matrix
Now we will show how to use only the correlation matrix for regression. This is useful if you want to do additional analysis on existing papers that provide correlation and / or covariance matrices, but you cannot obtain the raw data of these papers.
#Transfer the correlation matrix from the file on your computer. read.csv("cor.csv") Data.matrix (oaw) # changes from data framework to matrix #The correlation matrix is used for regression, and there is no original data mdeor
Most popular insights
2.Case implementation of panel smooth transition regression (PSTR) analysisAnalysis case implementation “)