preface
ggplot
It is a drawing system with complete grammar and easy to usePython
andR
Can be introduced and used in the field of data analysis visualization has a very wide range of applications. FromR
How to useggplot2
First of all, give me some reasons that I think are most worthy of recommendation:
- Using the design method of “layer” overlay, on the one hand, it can increase the connection between different graphs, on the other hand, it is also conducive to learning and understanding the
package
,photoshop
Old players should be able to understand the great convenience - It has a wide range of applications, detailed documents, and
?
And corresponding functions can be found inR
Function description document and corresponding instance found in - stay
R
andPython
It can be used in both languages to reduce the learning cost of the transition between the two languages
Basic concepts
This paper adoptsggplot2
Data set ofdiamonds
。
> head(diamonds)
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
#Variable meaning
price : price in US dollars (\$326–\$18,823)
carat : weight of the diamond (0.2–5.01)
cut : quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color : diamond colour, from D (best) to J (worst)
clarity: a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x : length in mm (0–10.74)
y : width in mm (0–58.9)
z : depth in mm (0–31.8)
depth : total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table : width of top of diamond relative to widest point (43–95)
Based on the concept of layer and canvas,ggplot2
The following grammatical framework is extended:
Source: https://mp.weixin.qq.com/s/us…
-
data
: data source, generallydata.frame
Structure, otherwise it will be converted to this structure - Individual mapping and common mapping:
ggplot()
Inmapping = aes()
Parameters belong to common mapping and will begeom_xxx()
andstat_xxx()
Inherited, andgeom_xxx()
andstat_xxx()
The mapping parameter in is a personality mapping and only works internally -
mapping
: mapping, including color type mappingcolor;fill
, shape type mappinglinetype;size;shape
And location type mappingx,y
etc. -
geom_xxx
: geometric objects, including point graph, line graph, column graph and histogram, etc., also including auxiliary drawing curve, oblique line, horizontal line, vertical line and text, etc -
aesthetic attributes
: drawing parameters, includingcolour;size;hape
etc. -
facetting
: faceting, dividing a dataset into multiple subsetssubset
And then plot the same chart for each subset -
theme
: Specifies the subject of the chart
Ggplot (data = Nall, mapping = AES (x =, y =)) + ාdata set
geom_ xxx()|stat_ Xxx() + ා geometric layer / statistical transformation
coord_ Xxx() + ාcoordinate transformation, default Cartesian coordinate system
scale_ Xxx() + ාscale adjustment, adjust specific scale
facet_ Xxx() + ාfacet, transform one of the variables into facet
Guides() + (legend adjustment)
Theme() (theme system)
These concepts can be looked back after reading the full text, which is equivalent to a summary. These concepts have mastered the basic
ggplot2
The core logic of
The meaning of some core concepts can be derived fromRStudio
Officialcheat sheet
It is generally known in the figure:
Some chestnuts
Through examples and
RCode
Introduction from shallow to deepggplot2
The syntax of.
1. Scatter diagram of five internal organs
library(ggplot2)
#Indicates that we use the diamonds dataset,
ggplot(diamonds) +
#Draw a scatter diagram: the abscissa x is depth, the ordinate y is price, the color of the points is distinguished by the color column, alpha transparency, size point size, shape shape (solid square), and the width of the stroke point border
geom_point(aes(x = carat, y = price, colour = color), alpha=0.7, size=1.0, shape=15, stroke=1) +
#Add fit line
geom_smooth(aes(x = carat, y = price), method = 'glm') +
#Add horizontal line
geom_hline(yintercept = 0, size = 1, linetype = "dotted", color = "black") +
#Add vertical line
geom_vline(xintercept = 3, size = 1, linetype = "dotted", color = "black") +
#Add axis and image title
labs(title = "Diamonds Point Plot", x = "Carat", y = "Price") +
#Adjust the display range of the axis
coord_cartesian(xlim = c(0, 3), ylim = c(0, 20000)) +
#Change the theme. This theme is simple. You can also get other themes in ggthemes package
theme_linedraw()
2. Custom picture layout & multiple geometric drawings
library(gridExtra)
#Build data set
df <- data.frame(
x = c(3, 1, 5),
y = c(2, 4, 6),
label = c("a","b","c")
)
p <- ggplot(df, aes(x, y, label = label)) +
#Remove abscissa information
labs(x = NULL, y = NULL) +
#Switch theme
theme_linedraw()
p1 <- p + geom_point() + ggtitle("point")
p2 <- p + geom_text() + ggtitle("text")
p3 <- p + geom_bar(stat = "identity") + ggtitle("bar")
p4 <- p + geom_tile() + ggtitle("raster")
p5 <- p + geom_line() + ggtitle("line")
p6 <- p + geom_area() + ggtitle("area")
p7 <- p + geom_path() + ggtitle("path")
p8 <- p + geom_polygon() + ggtitle("polygon")
#Construct ggplot picture list
plots <- list(p1, p2, p3, p4, p5, p6, p7, p8)
#Custom picture layout
gridExtra::grid.arrange(grobs = plots, ncol = 4)
3. Box line drawing
In statistics, an intuitive graph showing the dispersion of data is often used to show the dispersion of dependent variables under a certain factor variable in exploratory analysis.
Here are some of the longest used methods of box line drawing:
Library (ggplot2) - drawing
Library (ggsci) ා use color matching
#Using the diamonds data box, the classification variable is cut, and the target variable is depth
p <- ggplot(diamonds, aes(x = cut, y = carat)) +
theme_linedraw()
#When a factor type variable is used, the color is directly used to distinguish different categories. Later, the legend is set in the upper right corner
p1 <- p + geom_boxplot(aes(fill = cut)) + theme(legend.position = "None")
#When there are two factor variables, you can set one of them to X and the other to distinguish by legend color
p2 <- p + geom_boxplot(aes(fill = color)) + theme(legend.position = "None")
#Transpose the box diagram
p3 <- p + geom_boxplot(aes(fill = cut)) + coord_flip() + theme(legend.position = "None")
#Use out of the box color schemes: including scale_ fill_ jama(), scale_ fill_ nejm(), scale_ fill_ lancet(), scale_ fill_ Brewer() (Blue Series)
p4 <- p + geom_boxplot(aes(fill = cut)) + scale_fill_brewer() + theme(legend.position = "None")
#Construct ggplot picture list
plots <- list(p1, p2, p3, p4)
#Custom picture layout
gridExtra::grid.arrange(grobs = plots, ncol = 2)
When the box line graph of a continuous variable involves several discrete variables, we often use facetsfacetting
To improve the visibility of the chart.
library(ggplot2)
ggplot(diamonds, aes(x = color, y = carat)) +
#Switch theme
theme_linedraw() +
#The color of the box line is filled according to the factor variable color
geom_boxplot(aes(fill = color)) +
#Faceting: essentially, the data frame is divided into multiple subsets according to the factor variable color class, and the same boxplot is drawn on each subset
#Note that scale = "free" should be added in general, otherwise the data scale of the sub dataset will be pulled apart when there is a large difference
facet_wrap(~cut, scales="free")
4. Histogram
library(ggplo2)
#Normal histogram
p1 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut)) +
theme_linedraw() +
scale_fill_brewer()
#Stacked histogram
p2 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "identity") +
theme_linedraw() +
scale_fill_brewer()
#Cumulative histogram
p3 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill") +
theme_linedraw() +
scale_fill_brewer()
#Classification histogram
p4 <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") +
theme_linedraw() +
scale_fill_brewer()
#Construct ggplot picture list
plots <- list(p1, p2, p3, p4)
#Custom picture layout
gridExtra::grid.arrange(grobs = plots, ncol = 2)
5. Coordinate system
Except for those used in the front box line drawingcoord_flip()
The method realizes the coordinate axis rotation,ggplot
It also provides many functions related to coordinate system.
library(ggplot2)
bar <- ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1) +
#Specified ratio: the ratio of length to width is 1, which is convenient to display the figure
theme(aspect.ratio = 1) +
scale_fill_brewer() +
labs(x = NULL, y = NULL)
#Axis rotation
bar1 <- bar + coord_flip()
#Draw polar coordinates
bar2 <- bar + coord_polar()
#Construct ggplot picture list
plots <- list(bar1, bar2)
#Custom picture layout
gridExtra::grid.arrange(grobs = plots, ncol = 2)
6. Tile diagram and thermal diagram
Exploratory analysis in machine learningcorrplot
Directly draw the correlation coefficient diagram of all variables to judge the overall correlation coefficient.
library(corrplot)
#Calculate correlation coefficient matrix of dataset and visualize it
mycor = cor(mtcars)
corrplot(mycor, tl.col = "black")
ggplot
More personalized tile drawing is provided:
library(RColorBrewer)
#Generate correlation coefficient matrix
corr <- round(cor(mtcars), 2)
df <- reshape2::melt(corr)
p1 <- ggplot(df, aes(x = Var1, y = Var2, fill = value, label = value)) +
geom_tile() +
theme_bw() +
geom_text(aes(label = value, size = 0.3), color = "white") +
labs(title = "mtcars - Correlation plot") +
theme(text = element_text(size = 10), legend.position = "none", aspect.ratio = 1)
p2 <- p1 + scale_fill_distiller(palette = "Reds")
p3 <- p1 + scale_fill_gradient2()
gridExtra::grid.arrange(p1, p2, p3, ncol=3)
More examples
There are 50 classic onesggplot2
Drawing example:
http://r-statistics.co/Top50-…
Other articles
1. Machine learning must know must know and algorithm principle
Introduction to machine learning: what is machine learning
Machine learning must know must know: convex optimization
Machine learning algorithm: xgboost
Machine learning must know must know: gradient descent method
2. Data analysis and reptile cases
Python data analysis: who is the “first” domestic film in 2018
How to use Python crawler to realize simple PV brush amount — Taking CSDN as an example
Python script builds its own free agent IP pool from zero to one
3. Relevant experience
Autumn recruitment interview: what efforts should be made to get Tencent data post offer from zero base
How to use data thinking to win 90% of investors in the stock market
How hard is actuary certificate to be tested and how to prepare?
Reference
[1] https://ggplot2-book.org/intr…
[2] https://rstudio.com/resources…
[3] https://r4ds.had.co.nz/data-v…
[4] https://www.sohu.com/a/320024…