WeChat public address: “Python reads money”

If there are any questions or suggestions, please leave a message for the public.

In order to facilitate maintenance, the data of general companies are stored in separate tables in the database, such as using a table to store basic information of all users, and a table to store user consumption. Therefore, in daily data processing, two tables are often needed**Splice up**Use. This operation corresponds to SQL`join`

In pandas`merge`

To achieve. Let’s talk about this article`merge`

The main principle of.

The introduction above says`merge`

It is used to splice two tables, so the user information is naturally needed during splicing**One-to-one correspondence**Therefore, the two tables to be spliced need to have a common**Key (key)**。 In summary, the whole`merge`

The process is to send information**One to one matching**The process of`merge`

The four types of are`'inner'`

、`'left'`

、`'right'`

and`'outer'`

。

# I. inner

`merge`

Of`'inner'`

The type of is called**Internal connection**, it will take**Key intersection of two tables**Splicing. What do you mean? The next step is to disassemble in a graphic way.

First of all, we have the following data. The data on the left and right represent the user’s**Basic information**and**Consumption information**, the key to connect the two tables is`userid`

。

Now use`'inner'`

By`merge`

```
In [6]: df_1.merge(df_2,how='inner',on='userid')
Out[6]:
userid age payment
0 a 23 2000
1 c 32 3500
```

**Process diagram:**

① Take the key of two tables**intersection**Here`df_1`

and`df_2`

Of`userid`

The intersection is`{a,c}`

② Corresponding match

Results

**Process summary:**

It is believed that the whole process is not difficult to understand. What is shown above is that under the same key, two tables correspond to only one data (one user corresponds to one consumption record). Then, if**One user corresponds to multiple consumption records**Then, how is it spliced?

Let’s assume that the current data is as follows:`df_2`

There are two and`a`

Corresponding data:

Same use`inner`

By`merge`

：

```
In [12]: df_1.merge(df_2,how='inner',on='userid')
Out[12]:
userid age payment
0 a 23 2000
1 a 23 500
2 b 46 1000
3 c 32 3500
```

The whole process except**Corresponding matching stage**, the others are basically the same as above.

**Process diagram:**

① Take the key of two tables**intersection**Here`df_1`

and`df_2`

Of`userid`

The intersection is`{a,b,c}`

② When matching, due to the`a`

There are two corresponding consumption records, so when splicing, the user’s basic information table`a`

Corresponding data**Copy one more line to match the right**。

Results

# 2、 Left and right

`'left'`

and`'right'`

Of`merge`

In fact, the ways are similar. They are called**Left join**and**Right join**。 These two methods can be converted to each other, so they are introduced here.

`'left'`

：`merge`

When**The key of the left table is the benchmark**Pair with the missing value if the key in the left table does not exist on the right`NaN`

Fill.

`'right'`

：`merge`

When**The key of the table on the right is the benchmark**Pair. If the key in the table on the right does not exist on the left, use the missing value`NaN`

Fill.

What do you mean? Let’s use an example to explain. This is the data of the demonstration

Now use`'left'`

By`merge`

```
In [21]: df_1.merge(df_2,how='left',on='userid')
Out[21]:
userid age payment
0 a 23 2000.0
1 b 46 NaN
2 c 32 3500.0
3 d 19 NaN
```

**Process diagram:**

① Pair based on all keys in the table on the left. In the figure, because the`e`

It is not in the left table, so it will not be paired.

② If the`payment`

Columns are merged into the left table with missing values for those without matching values`NaN`

Fill

Process summary:

about`'right'`

Type`merge`

and`'left'`

In fact, it is almost the same. As long as you change the positions of the two tables, the results returned by the two methods are the same (), as follows:

```
In [22]: df_2.merge(df_1,how='right',on='userid')
Out[22]:
userid payment age
0 a 2000.0 23
1 c 3500.0 32
2 b NaN 46
3 d NaN 19
```

As for`'left'`

and`'right'`

As for the following`'outer'`

）The connected key is one to many case, principle and upper`'inner'`

It’s similar, so I won’t repeat it here.

# Three, outer

`'outer'`

yes**External connection**In the process of splicing, it will take**Union of keys of two tables**Splicing. Look at the text is not intuitive, or the example!

Use the demo data used above

This use`'outer'`

Conduct`merge`

```
In [24]: df_1.merge(df_2,how='outer',on='userid')
Out[24]:
userid age payment
0 a 23.0 2000.0
1 b 46.0 NaN
2 c 32.0 3500.0
3 d 19.0 NaN
4 e NaN 600.0
```

**The diagram is as follows:**

① Take the union of two table keys, here is`{a,b,c,d,e}`

② Put the data columns of the two tables together, and use the missing values where there is no matching`NaN`

Fill in

I think the little friends who can read here have basically understood`merge`

The whole process, in summary,`merge`

The difference is that when splicing,**The key sets of the two selected tables are different**。 About pandas`merge`

That’s it!

Originality is not easy. If you think it’s a little bit useful, I hope you can give me some praise. Thank you old fellow.

Scan code is concerned about the public number “Python reading money”, the first time to get dry goods, you can also add Python learning exchange group!