Datawhale pandas punch in – Chapter 4 grouping

Time:2022-1-3

Today, I’m learning Chapter 4 – grouping, which I think is a very important and useful knowledge of pandas. This chapter mainly introduces the application of AGG, transform, apply and other functions based on the growpy function.
The textbook summarizes the three operations of grouping: aggregation, transformation and filtering.

1. Polymerization:

The groupby object has defined some aggregation functions, such as max. But there are still some inconveniences, so the AGG function is introduced. The advantages of the AGG function are:

(1) The AGG function can use multiple functions at the same time.
Datawhale pandas punch in - Chapter 4 grouping

AGG uses multiple functions at the same time
(2) Specific aggregate functions can be used on specific columns.
Datawhale pandas punch in - Chapter 4 grouping

AGG uses specific aggregate functions for specific columns
(3) Use custom functions.
Datawhale pandas punch in - Chapter 4 grouping

AGG using custom functions
(4) Rename aggregation results.
Datawhale pandas punch in - Chapter 4 grouping

AGG aggregate result rename

Transformation and filtering

The return value of the transformation function is a sequence of the same length. When using custom transformation, you need to use the transform method. The incoming value of the called custom function is the sequence of the data source, which is consistent with the incoming type of AGG. The final return result is the dataframe whose row column index is consistent with the data source.

Cross column grouping

This application scenario can only use the apply function.

practice

Datawhale pandas punch in - Chapter 4 grouping

EX1 title

1. First filter out the cars with more than 2 countries, that is, if the country of the car does not appear more than 2 times in the overall data set, it will be eliminated, and then calculate the price mean, price variation coefficient and the number of cars in the country according to the country grouping. The calculation method of variation coefficient is standard deviation divided by the mean, and rename the variation coefficient cov in the result.

Datawhale pandas punch in - Chapter 4 grouping

Ex1.1

2. Group according to the first third, middle third and last third of the positions in the table, and count the mean value of price.
This question can be done according to the paragraph ‘the essence of grouping basis’.

Datawhale pandas punch in - Chapter 4 grouping

Ex1.2

3. Group the type, calculate the maximum and minimum values for price and HP respectively, and the result will produce multi-level indexes. Please combine the multi-level column indexes into single-level indexes with underscores.
Calculate the maximum and minimum values, and use the AGG function; The multi-level index is merged into a single-layer index, which is cited in Chapter 3.

Datawhale pandas punch in - Chapter 4 grouping

Ex1.3

4. Group the types and normalize the min max of HP within the group.
When grouping and then normalizing within a group, the returned sequence should be a sequence of the same length. At this time, the transform function should be used.

Datawhale pandas punch in - Chapter 4 grouping

Ex1.4

5. Group type and calculate disp Correlation coefficient with HP.
Cross column calculation must consider using the apply function.

Datawhale pandas punch in - Chapter 4 grouping

Ex1.5

Recommended Today

Tutorial on sending e-mail using net:: SMTP class in Ruby

Simple Mail Transfer Protocol(SMTP)SendE-mailAnd routing protocol processing between e-mail servers. RubyIt provides the connection of simple mail transfer protocol (SMTP) client of net:: SMTP class, and provides two new methods: new and start New takes two parameters: Server name defaults to localhost Port number defaults to 25 The start method takes these parameters: Server – […]