R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Time:2021-10-19

Original link:http://tecdat.cn/?p=22492 

===

We will use the wine data set for principal component analysis.

data

Data frame with 177 samples and 13 variables; Vintages contains class labels. These data are the results of chemical analysis of wines grown in the same region of Italy but from three different cultivated varieties: Nebiolo, Barbera and Grignard grapes. The wine from the Nebiolo grape is called Barolo.

These data contain the number of several components found in each of the three types of wine.

#   Look at the data
head(no)

output

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Converting and standardizing data

Logarithmic transformation and standardization, set all variables on the same scale.

#   Logarithmic transformation
no_log <- log(no)

#   Standardization
log\_scale <- scale(no\_log)
head(log_scale)

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Principal component analysis (PCA)

Principal component analysis using singular value decomposition algorithm

prcomp(log_scale, center=FALSE)
summary(PCA)

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Basic drawing (default)

Principal component score and load diagram with basic graphics

plot(scores\[,1:2\],  #  X and Y data
     pch=21,  #  Point shape
    cex=1.5,  #  Point size

legend("topright",  #  Location of legend
       legend=levels(vint),  #  Legend display

plot(loadings\[,1:2\],  #  X and Y data
     pch=21,  #  Shape of point

text(loadings\[,1:2\],  #  Sets the location of the label

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

In addition, we can add a 95% confidence ellipse to the groups in the score graph.

Confidence elliptic graph function

##   Elliptic curve
                        elev=0.95,  #  Ellipse probability level
                        pcol=NULL,  #  Adding colors manually must meet the length factor
                        cexsize=1,  #  Point size
                        ppch=21,  #  The point type must meet the length of the factor
                        legcexsize=2,  #  Legend font size
                        legptsize=2,  #  Legend point size

    ##   Set factor level
    if(is.factor(factr) {
        f <- factr
    } else {
        f <- factor(factr, levels=unique(as.character(factr)))
    }
    intfactr  <-  as.integer(f)  #  Sets the integer vector that matches the factor level

    ##   Get ellipse data
    edf  <-  data.frame(LV1  =  x,   LV2=y,   factr  =  f)  #  Create a data frame with data and factors
    ellipses <- dlply(edf, .(factr), function(x) {

        Ellipse(LV1,   LV2,   levels=elev,   robust=TRUE,   draw=FALSE)  # Obtain confidence ellipse points by factor level from dataellipse() function
    })
    ##   Gets the range of X and Y data
    xrange <- plotat(range(c(as.vector(sapply(ellipses, function(x) x\[,1\])), min(x), max(x))))
    ##   Set colors for blocks
    if(is.null(pcol)  !=  TRUE)   {  #  If the color is provided by the user
        pgcol  <-  paste(pcol,   "7e",   sep="")  #  Increase opacity

    #   Drawing graphics
    plot(x,y, type="n", xlab="", ylab="", main=""
    abline(h=0,   v=0,   col="gray",   lty=2)  # Add line at 0
    legpch  <-  c()  #  Vector to collect legend data
    legcol  <-  c()  #  Vector to collect legend col data
    ##   Add points, ellipses, and determine the color of the legend
    ##   legend
    legend(x=legpos, legend=levels(f), pch=legpch, 
##   Axis diagram of PCA output using prcomp() function
    pcavar <- round((sdev^2)/sum((sdev^2))

Basic graphics

Draw the principal component score diagram and use the basic default value to draw the load diagram

plot(scores\[,1\],  #  X-axis data
            scores\[,2\],  #  Y-axis data
            vint,  #  There are similar factors
            pcol=c(),  #  Color used for drawing (must match the number of factors)
            pbgcol=FALSE,  # Is the border of the dot black?
            cexsize=1.5,  #  Point size 
            ppch=c(21:23),  #  The shape of the point (must match the number of factors)
            legpos="bottom   right",  #  Location of legend           
            legcexsize=1.5,  #  Legend text size
            legptsize=1.5,  #  The size of the legend point 
            axissize=1.5,  #  Sets the text size for the axis
            linewidth=1.5  #  Set axis size
)                         
title(xlab=explain\[\["PC1"\]\],  #  Percentage of variance explained on PC1
      ylab=explain\[\["PC2"\]\],  #  Percentage of variance explained by PC2 
      main="Scores",  #  title
      cex.lab=1.5,  #  Size of label text
      cex.main=1.5  #  Size of title text

plot(loadings\[,1:2\],  #  X and Y data
     pch=21,  #  Shape of point
     cex=1.5,  #  Point size
    #   type="n",  #  Do not draw points
     axes=FALSE,  #  Do not print axes
     xlab="",  #  Delete x label
     ylab=""               #  Delete y label
)
pointLabel(loadings\[,1:2\],  # Sets the location of the label
           labels=rownames(PCAloadings),  #  Output label
           cex=1.5  #  Sets the size of the label
)  #  Pointlabel will attempt to place text around the point
axis(1,  #  Show X axis
     cex.axis=1.5,  #  Sets the size of the text
     lwd=1.5  #  Sets the size of the grid line
)
axis(2,  #  Show Y axis
     las=2,  #  Parameter sets the direction of the text. 2 is vertical
     cex.axis=1.5,  #  Sets the size of the text
     lwd=1.5  #  Sets the size of the grid line
)
title(xlab=explain\[\["PC1"\]\],  #  Percentage of variance explained by PC1
      ylab=explain\[\["PC2"\]\],  #  Percentage of variance explained by PC2 
    
      cex.lab=1.5,  #  Size of label text
      cex.main=1.5  #  Size of title text
)

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components


R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Most popular insights

1.Matlab partial least squares regression (PLSR) and principal component regression (PCR)

2.Dimension reduction and visual analysis of principal component PCA and t-sne algorithms for high-dimensional data in R language

3.Basic principle of principal component analysis (PCA) and analysis examples

4.Lasso regression analysis based on R language

5.Using lasso regression to predict stock return data analysis

6.Lasso regression, ridge ridge regression and elastic net model in R language

7.Partial least squares regression PLS Da data analysis in R language

8.Partial least squares PLS regression algorithm in R language

9.R language linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and canonical discriminant analysis (RDA)