R data Science Chapter 1 (ggplot2)

Time:2022-2-1

Part I exploration

Chapter 1 uses ggplot2 for data visualization

1.1 INTRODUCTION

First install and load the R package (ggplot2)

if(!require('ggplot2'))install.packages('ggplot2')
library('ggplot2')

Let’s use ggplot2 to explore a question: do large engine cars consume more fuel than small engine cars?

1.2 first step

1.2.1 mpg data frame

The data frame is a built-in data set of ggplot2 for us to learn and use. You can check what items and structures are included in the mpg dataset by yourself: use (? MPG)

mpg
#> # A tibble: 234 x 11
#>   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
#>   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
#> 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
#> 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
#> 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
#> 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
#> 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
#> 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…
#> # … with 228 more rows

The specific parameter meanings of MPG data set are sorted out below

name meaning
manufacturer Manufacturer name (name of the manufacturer of the car, e.g. Audi)
model Model name (vehicle model, such as A6)
displ Engine displacement, in litres
year Year of manufacture
cyl Number of cylinders
trans Type of transmission
drv The type of drive train, where f = front wheel drive, r = rear wheel drive, 4 = 4WD
cty City miles per gallon
hwy Highway miles per gallon
fl Fuel type
class Type of car, SUV, etc
1.2.2 create ggplot graph

X-axis displ, Y-axis Hwy

ggplot(data = mpg)+
    geom_point(mapping = aes(x = displ, y = hwy))
Rplot17

It can be seen from this figure that the larger the engine, the less mileage the same volume of oil runs. That is, the larger the engine, the more fuel it consumes (displ is negatively correlated with Hwy)

Draw in ggplot2:

1. Start drawing with the ggplot() function, which creates a coordinate system on which you can add layers. The first parameter of ggplot () is data, which is the data set to be used in the graph.

2. Add one or more layers to ggplot () to complete the picture. Function geom_ Point () is to add a point layer to the graph, so that you can create a scatter chart. This is called the geometry function.

3. Each geometry function has a mapping parameter. This parameter defines how to map variables in the dataset to graphical attributes. The mapping parameter sum aes() function appears in pairs, and the X and Y parameters of aes() function specify the variables mapped to the X and Y axes respectively.

Ggplot2 drawing concept is plot = data + geometry + aesthetics

Drawing template:

#Replace the contents in < > with data set, geometric object function and mapping set respectively
ggplot(data = <DATA>)+
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

1.3 graphic attribute mapping

This chapter first needs to understand variable types.

1

1. AES () gathers each drawing attribute mapping used in the layer together, and then passes it to the mapping parameters of the layer.

2. If you want to manually set the attributes of the figure, you need to set them by name as a parameter of the geometric object function, that isSet outside the function aes()

3. The shape of a point is a numeric value. The boundary color of the hollow shape (0-14) is determined by color; The filling color of the solid shape (15-20) is determined by color; The boundary color of the fill shape (21-24) is determined by color, and the fill color is determined by fill.

<img src=”https://gitee.com/yahangliang/typoraimage/raw/master/20210510193603.png” style=”zoom:67%;” />

practice:

1、 What’s wrong with the following code? Why isn’t the dot blue?

ggplot(data = mpg) +
geom_point(
mapping = aes(x = displ, y = hwy, color = "blue")
)

Replace “blue” with the variable DRV in mpg and compare it

p1 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, color = "blue")
    )
p2 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, color = drv))
p1+p2
Rplot

The error of the first code is what we said in point 2 at the beginning of this section.

2、 Which variables in mpg are classified variables? Which variables are continuous? (tip: enter? Mpg to read the documentation of this dataset.) How can I see this information when calling mpg?

Classification variables include manufacturer, model, trans, DRV, FL and class

Continuous variables include: displ, year, CYL, Cty, Hwy

There is a shortcut:Map the variable to the shape parameter. If it is a continuous variable, an error will be reported, and the classified variable will not.

Alternatively, you can use the commandView(mpg)To view the mpg dataset.

3、 Map a continuous variable to color, size, and shape. How do these graphical attributes behave differently for categorical and continuous variables?

Here, we use year for continuous variables and model for classified variables; Then map to color, size and shape respectively.

(1)color

p1 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, color = year)
    )
p2 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, color = model))
p1+p2
Rplot01

(2) size

p1 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, size = year)
    )
p2 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, size = model))
p1+p2    
## Warning message:
## Using size for a discrete variable is not advised. 
Rplot02

This warning means that it is not recommended that we assign the discrete variable model to size (model is clearly an unordered classified variable, I don’t know why it is suggested here that it is a discrete variable?)

(3)shape

p1 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, shape = year)
    )
#Error: a continuous variable can not be mapped to shape
p2 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, shape = model))
p2
Rplot03

If continuous variables are assigned to shape, an error will be reported. Classified variables can be assigned to shape, but the system supports up to 6 shapes, so you can see that only 6 kinds are displayed in the figure.

4、 What happens if you map the same variable to multiple graph attributes?

ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, size = hwy, color = hwy))
Rplot04

The information is redundant, and the size and color depth of the midpoint in the figure can reflect the Hwy attributes of each car.

5、 What is the function of the graph attribute stroke? What shapes does it apply to? (tip: use the “geom_point” command.)

p1 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy), 
        Shape = 0 # 0 represents a hollow square
    )
p2 <- ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy), 
        Shape = 0, stroke = 3 # 0 represents a hollow square
    )

p1 + p2
Rplot05

We can see that the parameter stroke represents the graphic attribute: the width of the border.

6、 What happens if you map graph attributes to non variable name objects, such as AES (color = displ < 5)?

ggplot(data = mpg) +
    geom_point(
        mapping = aes(x = displ, y = hwy, color = displ < 5))
Rplot06

There are two kinds of colors: one is displ < 5, and the other is displ > = 5.

1.4 frequently asked questions

Search Google / Bing / developeppaper / wechat / YuQue for answers. Many people have already encountered most of the problems you encounter, and some great gods have replied.

1.5 split

If the data set is differentVariables are continuous variablesWe can add it to the graph attribute by mapping, that is, write it in the AES () function. ifVariables are classified variables, so we can use faceted method, that is, the subgraph that displays the data subset.

facet_ The first parameter of wrap () is a formula. The creation method is to add a variable name or vars (variable, variable, variable,)

facet_ The first parameter () is the way to create a ~ or grid, and the first two parameters are separated

practice:

1、 What happens if continuous variables are used for faceting?

ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy))+
    facet_wrap(~cty)
Rplot07

It doesn’t make sense to show the data separately according to continuous variables. We can’t see the law of data distribution

2、 Using facet_ What is the meaning of blank cells in the graph generated by grid (DRV ~ CYL)? How do they relate to the diagram generated by the following code?

ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
Rplot08
ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy))+
    facet_grid(drv ~ cyl)
Rplot09

Blank means there is no corresponding data here. The two figures are one-to-one correspondence. Each point in the above figure corresponds to each face in the following figure.

3、 What will the following code draw What is the function of?

p1 <- ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy)) +
    facet_grid(drv ~ .)
p2 <- ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy)) +
    facet_grid(. ~ cyl)

p1 + p2 
Rplot10

. it means No. It’s divided here. For example: DRV ~ Represents that the row dimension is faceted by DRV, and the column dimension is not faceted~ CYL represents that the row dimension is not faceted, and the column dimension is faceted by Cyl.

4、 View the first facet of this section:

ggplot(mpg,aes(x = displ, y = hwy))+
    geom_point()+
    facet_wrap(~class, nrow = 2)
Rplot12

What are the advantages and disadvantages of using facets compared with using graphic attributes? If there is a larger data set, how will you weigh the advantages and disadvantages of the two methods?

Advantages: the faceted attribute of classification variables can better view the relationship between data in different classifications (the relationship between XY)

Disadvantages: it is unnecessary to use facets for continuous variables, and the trouble is meaningless.

5、 Reading? facet_ Wrap’s help page. What are the functions of nrow and ncol? What other options can control the layout of facets? Why function facet_ Grid() has no variables nrow and ncol?

Nrow represents the number of rows of facet, ncol represents the number of columns of facet. Control the layout options. You can see the help document yourself.

YQ Qian, the elder brother’s explanation, feels very good.

Facet_grid and facet_wrap. The mesh facet generates a 2D panel grid. The rows and columns of the panel are defined by variables. The encapsulated facet is formed into a 1D panel strip, and then encapsulated into 2D.(ggplot2 Book P (141)) you can see the facet_ Grid () is originally a two-dimensional, so there are no variables nrow and ncol.

p1 <- ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy)) +
    facet_grid(drv ~ cyl)
p2 <- ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy)) +
    facet_wrap(drv ~ cyl)
p1+p2
Rplot13

Facet on the left_ Grid (DRV ~ CYL), facet on the right_ Wrap (DRV ~ CYL) you can see by comparison.

6、 Using function facet_ When grid (), variables with more unique values should generally be placed on the column. Why do you do that?

Share one ordinate to facilitate data comparison with each other.

1.5 geometric objects

Local mapping > Global Mapping

practice:

1、 What kind of geometric objects should be used when drawing broken line diagram, box line diagram, histogram and partition diagram?

Line chart: geom_ line()

Box diagram: geom_ boxplot()

Histogram: geom_ histogram()

Partition map: geom_ area()

2、 Run the following code in your mind and predict what output will be. Then run the code in R and check that your prediction is correct.

ggplot(
  data = mpg,
  mapping = aes(x = displ, y = hwy, color = drv)
) +
  geom_ Geometric scatter
  geom_ Smooth (SE = false) # geometric object, smooth data, fit curve, no confidence interval
Rplot14

3、 Show What is the role of legend = false? What happens if you delete it? Why do you think I should use this code in the example earlier in this chapter?

If it is deleted, it will be added with the above note. It appears on the right by default. There is too much redundancy in the diagram.

4、 Geom_ What is the role of the se parameter in the smooth () function?

Do not add confidence intervals for curves

5、 Is there any difference between the two diagrams generated by the following code? Why?

p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point() +
    geom_smooth()

p2 <- ggplot() +
    geom_point(
        data = mpg,
        mapping = aes(x = displ, y = hwy)
    ) +
    geom_smooth(
        data = mpg,
        mapping = aes(x = displ, y = hwy)
    )
p1 + p2
Rplot15

No difference. The mapping is the same, so they can be put into the global mapping together.

6、 Write your own R code to generate the following diagrams.

p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
    geom_point(size = 5)+
    geom_smooth(se = FALSE,size = 2)

p2 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
    geom_point(size = 5)+
    geom_smooth(mapping = aes(group = drv),se = FALSE,size = 2)

p3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv))+
    geom_point(size = 5)+
    geom_smooth(se = FALSE,size = 2)

p4 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
    geom_point(mapping = aes(color = drv),size = 5)+
    geom_smooth(se = FALSE,size = 2)

p5 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
    geom_point(mapping = aes(color = drv),size = 5)+
    geom_smooth(mapping = aes(linetype = drv),se = FALSE,size = 2)

#Stroke parameter means stroke width. Shape is the type of point. Here color follows stroke, so it represents stroke color
p6 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))+
    geom_point(shape = 21, stroke = 3, color = 'white')+
    geom_point(mapping = aes(color = drv),size = 3)

(p1+p2)/(p3+p4)/(p5+p6)
Rplot16

1.7 statistical transformation

Diamonds dataset

name meaning
price Price in US dollars ($326 – $18823)
carat Weight of the diamond (0.2 – 5.01) carat
cut Quality of the cut (fair, good, very good, premium, ideal)
color Diamond colour, from D (best) to J (worst)
clarity A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, vvs1, if (best))
x Length in mm (0 – 10.74)
y Width in mm (0 – 58.9)
z Depth in mm (0 – 31.8) depth
depth Total depth percentage = Z / mean (x, y) = 2 * Z / (x + y) (43 – 79) total depth percentage
table Width of top of diamond relative to widest point (43 – 95)

practice:

1、 Stat_ What is the default geometry object for the summary() function? How to use geometric object function to regenerate the above graph without using statistical transformation function?

#Using the statistical function stat_ Summary to draw, and its default geometric object is geom_ pointrange
ggplot(diamonds)+
    stat_summary(aes(x = cut, y = depth),
                 fun.max = max,
                 fun.min = min,
                 fun = median)

#Now use geom_ Drawing with pointrange
ggplot(diamonds)+
    geom_pointrange(aes(x = cut, y = depth),
                    Stat = 'summary', # statistical transformation is changed to summary, and the default identity is no processing
                    fun. Max = max, # Max
                    fun. Min = min, # min
                    Fun = median) # median
Rplot16

2、 Geom_ What is the function of col() function? It and geom_ How is the bar() function different?

#Look, geom_ If you want the heights of the bars to represent # values in the data, use geom_ col() instead.  geom_ bar() uses stat_ count() by default: # it counts the number of cases at each x position.  geom_ col() uses stat_ identity():  # it leaves the data as is.
#Left picture
p1 = ggplot(diamonds)+
    geom_bar(aes(x = cut))
#Right picture
p2 = ggplot(diamonds)+
    geom_col(aes(x = cut, y = depth))

library(patchwork)
p1|p2
Rplot8

Connection: both functions are used to make histogram. geom_ The bar height of col() can represent the value of a variable in the data set (that is, the default statistical transformation is stat_identity()); geom_ The bar column height of bar () can only represent the number of cases at each position of the x-axis (that is, the default statistical transformation is stat_count()).

3、 Most geometric objects and statistical transformations appear in pairs and are always used together. Read the documentation carefully and list all pairs of geometric objects and statistical transformations. What do they have in common?

Too many to list

4、 Stat_ What variable does smooth () calculate? What parameters can control its behavior?

Check stat_ In the help document (? Stat_smooth()) of smooth(), you can see one item (calculated variables)

Therefore, the variables calculated by this function are:

  • Y predicted value
  • Ymin lower pointwise confidence interval around the mean
  • Ymax upper pointwise confidence interval around the mean
  • Se standard error

Parameters that can control its behavior:

  • Position adjust position
  • The method or function of method smooth, such as: “LM”, “GLM”, “gam”, “loss” or a function (user-defined function)
  • Formula formulas used in smoothing functions, such as y ~ x, y ~ poly (x, 2), y ~ log (x) NULL by default, in which case method = NULL implies formula = y ~ x when there are fewer than 1,000 observations and formula = y ~ s(x, bs = “cs”) otherwise.
  • Se whether to display standard deviation

5、 In the proportional bar chart, we need to set group = 1. Why? In other words, what’s wrong with the following two pictures?

p1 <- ggplot(data = diamonds) + 
    geom_bar(mapping = aes(x = cut, y = after_stat(prop)))

p2 <- ggplot(data = diamonds) + 
    geom_bar(mapping = aes(x = cut, fill = color, y = after_stat(prop)))
library(pacthwork)
p1|p2
Rplot9

We found that without adding group = 1, the system defaults to 1 for each value of the x-axis

p1 <- ggplot(data = diamonds) + 
    geom_bar(mapping = aes(x = cut, y = after_stat(prop),group = 1))

p2 <- ggplot(data = diamonds) + 
    geom_bar(mapping = aes(x = cut, fill = color, y = after_stat(prop),group = color))
p1|p2
Rplot

P1 plus group = 1 means that all values of the x-axis are one group, and then calculate the proportion of different values (fair, good, etc.) in this group

P2 why not add group = 1 instead of group = color? Because we have an additional parameter fill = color, the number of bands in our graph is cutColor is 5 × 7 = 35, so all values of the x-axis have 35 groups. Therefore, the percentage calculation should be the percentage of 35 groups, so the percentage is calculated by color,That is, the sum of the widths of all colors in each group represents the value of this group, and then divided by the sum of the widths of all colors in all groups.*

1.8 position adjustment

All layers have positional adjustments to resolve overlapping geometry. Override the default value by using the position parameter of the geom or stat function.

  • Position = “identity” directly display the value of each object in the graph, so that the data will overlap, which is not suitable for displaying the results
  • Position = “fill” stacked bar chartEach group of bars has the same heightThis makes it easy to compare the proportions between groups. (Stacked percentage bar chart
  • Position = “dodge” parallel bar chart
  • Position = “stack” stack bar chart(Stacked are specific values
  • Position = “jitter” random jitter of data (generally used in scatter diagram to avoid point coincidence)
if(!require("ggplot2"))install.packages("ggplot2")
if(!require("patchwork"))install.packages("patchwork")
library("ggplot2")
library("patchwork")
p0 <- ggplot(diamonds,aes(x= cut, fill = clarity))+
    geom_bar()+
    labs(title = "position_stack")

p1 <- ggplot(diamonds,aes(x= cut, fill = clarity))+
    geom_bar(alpha = 0.2, position = "identity")+
    labs(title = "position_identity")

#P1 and P2 have the same effect of modifying position parameters, but the writing method is different.
p2 <- ggplot(diamonds,aes(x= cut, color = clarity))+
    geom_bar(fill = NA, position = position_identity())+
    labs(title = "position_identity")

p3 <- ggplot(diamonds, aes(x = cut, fill = clarity))+
    geom_bar(position = "fill")+
    labs(title = "position_fill")

p4 <- ggplot(diamonds, aes(x=cut, fill = clarity))+
    geom_bar(position = position_dodge())+
    labs(title = "position_dodge")
    
pt <- (p0|p1|p2)/(p3|p4)
ggsave("Rplot.png",pt)
Rplot
  1. P0 the default location parameter is stack, that is, stack the original data directly
  2. P1 change the position parameter to identity, but this directly displays the original data (all from the bottom of the x-axis), resulting in data overlap, which is not conducive to analysis. Therefore, alpha parameter is used to make the graph transparent.
  3. P2 change the position parameter to identity. This time, fill it directly as none (i.e. background color), and then use the color parameter to color the line. In this way, you can also see the number of different clarities in each group
  4. P3 modify the position parameter to fill and stack the percentage bar graph, which is conducive to comparing the proportion of clarity in different groups.
  5. P4 change the position parameter to dodge and place each group of bars side by side in order to facilitate easy comparison of the values of each bar.

Recall the scatter plot of displ and Hwy of MPG drawn in the previous section. You will find that the number of points in the plot is 126, but there are 234 observations in the data set

Rplot1

Change the position adjustment method to jitter, that is, “jitter” adds a small random jitter to each data point to disperse the overlapping points. Ggplot2 provides a geom because the operation is very useful and allows us to see the aggregation pattern of real data_ Jitter() function

ggplot(mpg,aes(displ, hwy))+
    geom_point(position = position_jitter())
Rplot17

practice:

1、 What’s wrong with the picture below? How to improve?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point()
Rplot18

The points overlap. Adjust the position jitter

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter()
Rplot19

2、 Geom_ What parameters does jitter () use to control the degree of jitter?

Width and height

use? geom_ Jitter() view the help document. The original explanation is: amount of vertical and horizontal jitter The jitter is added in both positive and negative directions, so the total spread is twice the value specified here. That is, the jitter value is added in both positive and negative directions. The default value is 40%, so the actual jitter is 0.8. Observe the figure below

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
    geom_jitter()

p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
    geom_jitter(width = 0.2,
                height = 0.2)
p1+p2
Rplot20

3、 Contrast geom_ Jitter () and geom_ count()。

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
    geom_jitter()

p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
    geom_count()
p1+p2
Rplot21

Compare the graphics. Jitter is to add jitter, and count is to re count the overlapping points and change the size of the points according to the count.

4、 Geom_ What is the default position adjustment method of boxplot() function? Create a visual representation of the mpg dataset to demonstrate.

? geom_ The default position adjustment method of boxplot view is dodge2

ggplot(mpg,aes(x = drv, y = cty))+
    geom_boxplot()
Rplot22

1.9 coordinate system

The default coordinate system is Cartesian rectangular coordinate system.

1、coord_ Flip can exchange the X and Y axes, which is easy to use when you want to level the box diagram, or the X axis group name is too long to squeeze

p1 <- ggplot(mpg,aes(x= class, y = hwy))+
    geom_boxplot()
p2 <- ggplot(mpg,aes(x= class, y = hwy))+
    geom_boxplot()+
    coord_flip()
p1+p2
Rplot23

2、coord_ The polar() function uses a polar coordinate system.

p0 <- ggplot(diamonds,aes(x = cut, fill = cut))+
    geom_ Bar (width = 1) + # set the strip width to 1. The default value is 90% of the graphics resolution
    Theme (aspect. Ratio = 1) + # parameter aspect Ratio indicates the aspect ratio of the graph. We set it to 1
    Labs (x = null, y = null) # does not display the labels of X and Y axes, i.e. (cut and count characters)

p1 <- p0+coord_flip()  
p2 <- p0+coord_polar()
p1+p2
Rplot24

practice:

1、 Using Coord_ The polar() function converts a stacked bar chart into a pie chart.

p0 <- ggplot(diamonds,aes(x= cut, fill = clarity))+
    geom_bar(width = 1)+
    labs(x=NULL, y = NULL)+
    theme(aspect.ratio = 1)
p1 <- p0+coord_polar()
p2 <- p0+coord_polar(theta = "y")
p0+p1+p2
Rplot10

Compare the differences between the two figures. The parameter theta is introduced: variable to map angle to (x or y), which means: which variable value is mapped to the angle. The default value is x, so the middle figure P1 is a five equal angle (because of the x-axis cut, five cuts are five equal). The figure on the right is changed to theta = “Y” (that is, it is mapped to the angle according to the count calculated by each group)

For another pie chart, we change the position parameter of P0 to fill, that is, the strips are stacked by percentage.The pie chart drawn in this way has no blank, because each group is 1

p0 <- ggplot(diamonds,aes(x= cut, fill = clarity))+
    geom_bar(width = 1, position = "fill")+
    labs(x=NULL, y = NULL)+
    theme(aspect.ratio = 1)
p1 <- p0+coord_polar()
p2 <- p0+coord_polar(theta = "y")
p0+p1+p2
Rplot11

2、 What is the function of labs() function? Read the documentation.

? Labs () read the help document and run the sample code.

Built in parameters modify various names for graphics, such as title, subtitle, label in the lower right corner, add alphabetic label to graphics, modify the name of XY axis, etc.

3、 Coord_ Quickmap() function and Coord_ What is the difference between the map () function?

It’s used for drawing maps. I feel I should not use it, so I didn’t read it. I need to learn it again.

4、 What is the relationship between urban and highway fuel efficiency shown in the figure below? Why coord_ Is the fixed () function important? geom_ What is the purpose of the abline() function?

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
    geom_point() + 
    geom_abline() +
    coord_fixed()
p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
    geom_point() + 
    geom_abline()
p1+p2
Rplot13

? coord_ Fixed() first parameter ratio: aspect ratio, expressed as Y / X. That is, the ratio of Y axis to X axis. The default value is 1. When drawing by yourself, you can adjust the proportion of XY axis to make the data more beautiful. It’s easy to see other parameters in the document.

? geom_ ABline () view the help documentation and run the sample code. Function: add a reference line, which can be used to compare the difference of data. Note that it does not fit the data into a trend line. Below is geom_ ABline () and geom_ Smooth() difference

p1 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
    geom_point() + 
    geom_abline()+
    coord_fixed()

p2 <- ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
    geom_point() + 
    geom_smooth(method = "lm", se = FALSE)+
    coord_fixed()
p1+p2
Rplot14

1.10 graphical hierarchical syntax

Ggplot2 basic framework for drawing graphics:

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>

Seven parameters are the contents of < >: data set, geometric object, mapping set, statistical transformation, position adjustment, coordinate system and facet mode. In most cases, you don’t need to provide all of them, because r provides many default values.

Finally, you can understand the drawing principle of ggplot2. If you encounter any drawings in the future, just check the help document + Google. There are many codes written by great gods on the Internet~

Partial reference:

https://www.yuque.com/erhuoqian/mudww7/ugt8l6#U7UV9

https://ggplot2.tidyverse.org/reference/

This is the English version of R data science e-book. If you are interested, you can see:https://r4ds.had.co.nz/

Recommended Today

Records about the common problems of Microsoft office 2021 home and student versions _ the shadow of excel in the cell selection is stuck and delayed during the process of pulling down the data area and is out of sync with the mouse pointer!

The problem has been tested on the home and student versions of office 2021 on 2 computers, and the same problem occurs The mouse operation is to pull down at a constant speed. Pay attention to the change speed of the number of lines. The number of pull-down lines in the data area changes slowly […]