# R language principal component analysis (PCA) wine visualization: dispersion point diagram and load diagram of principal components

Time：2021-10-19

===

We will use the wine data set for principal component analysis.

## data

Data frame with 177 samples and 13 variables; Vintages contains class labels. These data are the results of chemical analysis of wines grown in the same region of Italy but from three different cultivated varieties: Nebiolo, Barbera and Grignard grapes. The wine from the Nebiolo grape is called Barolo.

These data contain the number of several components found in each of the three types of wine.

``````#   Look at the data

output

## Converting and standardizing data

Logarithmic transformation and standardization, set all variables on the same scale.

``````#   Logarithmic transformation
no_log <- log(no)

#   Standardization
log\_scale <- scale(no\_log)

## Principal component analysis (PCA)

Principal component analysis using singular value decomposition algorithm

``````prcomp(log_scale, center=FALSE)
summary(PCA)``````

## Basic drawing (default)

Principal component score and load diagram with basic graphics

``````plot(scores\[,1:2\],  #  X and Y data
pch=21,  #  Point shape
cex=1.5,  #  Point size

legend("topright",  #  Location of legend
legend=levels(vint),  #  Legend display

pch=21,  #  Shape of point

In addition, we can add a 95% confidence ellipse to the groups in the score graph.

## Confidence elliptic graph function

``````##   Elliptic curve
elev=0.95,  #  Ellipse probability level
pcol=NULL,  #  Adding colors manually must meet the length factor
cexsize=1,  #  Point size
ppch=21,  #  The point type must meet the length of the factor
legcexsize=2,  #  Legend font size
legptsize=2,  #  Legend point size

##   Set factor level
if(is.factor(factr) {
f <- factr
} else {
f <- factor(factr, levels=unique(as.character(factr)))
}
intfactr  <-  as.integer(f)  #  Sets the integer vector that matches the factor level

##   Get ellipse data
edf  <-  data.frame(LV1  =  x,   LV2=y,   factr  =  f)  #  Create a data frame with data and factors
ellipses <- dlply(edf, .(factr), function(x) {

Ellipse(LV1,   LV2,   levels=elev,   robust=TRUE,   draw=FALSE)  # Obtain confidence ellipse points by factor level from dataellipse() function
})
##   Gets the range of X and Y data
xrange <- plotat(range(c(as.vector(sapply(ellipses, function(x) x\[,1\])), min(x), max(x))))
##   Set colors for blocks
if(is.null(pcol)  !=  TRUE)   {  #  If the color is provided by the user
pgcol  <-  paste(pcol,   "7e",   sep="")  #  Increase opacity

#   Drawing graphics
plot(x,y, type="n", xlab="", ylab="", main=""
abline(h=0,   v=0,   col="gray",   lty=2)  # Add line at 0
legpch  <-  c()  #  Vector to collect legend data
legcol  <-  c()  #  Vector to collect legend col data
##   Add points, ellipses, and determine the color of the legend
##   legend
legend(x=legpos, legend=levels(f), pch=legpch,
##   Axis diagram of PCA output using prcomp() function
pcavar <- round((sdev^2)/sum((sdev^2))``````

## Basic graphics

Draw the principal component score diagram and use the basic default value to draw the load diagram

``````plot(scores\[,1\],  #  X-axis data
scores\[,2\],  #  Y-axis data
vint,  #  There are similar factors
pcol=c(),  #  Color used for drawing (must match the number of factors)
pbgcol=FALSE,  # Is the border of the dot black?
cexsize=1.5,  #  Point size
ppch=c(21:23),  #  The shape of the point (must match the number of factors)
legpos="bottom   right",  #  Location of legend
legcexsize=1.5,  #  Legend text size
legptsize=1.5,  #  The size of the legend point
axissize=1.5,  #  Sets the text size for the axis
linewidth=1.5  #  Set axis size
)
title(xlab=explain\[\["PC1"\]\],  #  Percentage of variance explained on PC1
ylab=explain\[\["PC2"\]\],  #  Percentage of variance explained by PC2
main="Scores",  #  title
cex.lab=1.5,  #  Size of label text
cex.main=1.5  #  Size of title text

pch=21,  #  Shape of point
cex=1.5,  #  Point size
#   type="n",  #  Do not draw points
axes=FALSE,  #  Do not print axes
xlab="",  #  Delete x label
ylab=""               #  Delete y label
)
cex=1.5  #  Sets the size of the label
)  #  Pointlabel will attempt to place text around the point
axis(1,  #  Show X axis
cex.axis=1.5,  #  Sets the size of the text
lwd=1.5  #  Sets the size of the grid line
)
axis(2,  #  Show Y axis
las=2,  #  Parameter sets the direction of the text. 2 is vertical
cex.axis=1.5,  #  Sets the size of the text
lwd=1.5  #  Sets the size of the grid line
)
title(xlab=explain\[\["PC1"\]\],  #  Percentage of variance explained by PC1
ylab=explain\[\["PC2"\]\],  #  Percentage of variance explained by PC2

cex.lab=1.5,  #  Size of label text
cex.main=1.5  #  Size of title text
)``````

Most popular insights

## [hematemesis finishing] Super complete golang interview questions collection + golang Learning Guide + golang knowledge map + growth route

The brain map is constantly updated. Check the address onlineSubsequent articles and contents will be updated toGitHub projectWelcome to pay attention. Directory (Ctrl + F) Basic introduction Novice 50 mistakes that golang novices often make data type I don’t even know that nil slice is different from empty slice? Then the bat interviewer has to […]