Pandas – basic operation of dataframe

Time:2022-5-8

We learned in the last sectionSeriesThe basic operations of structure addition, deletion, modification and query are mastered in this sectionDataFrameIt will be very easy to add, delete, modify and check~

First, let’s construct aDataFrame

data = [[1,2,3], [4,5,6], [7,8,9]]
index = ['a', 'b', 'c']
columns = ['A', 'B', 'C']
df = pd.DataFrame(data=data, index=index, columns=columns)
df
Pandas - basic operation of dataframe

check

Query specified column:

>> df['A']
a    1
b    4
c    7
Name: A, dtype: int64
>> df[['A','C']]
    A   C
a   1   3
b   4   6
c   7   9

uselocandilocQuery the specified row:

>> df.loc['a']
A    1
B    2
C    3
Name: a, dtype: int64
>> df.iloc[0]
A    1
B    2
C    3
Name: a, dtype: int64
>> df.loc['a':'b']
    A   B   C
a   1   2   3
b   4   5   6
>> df.iloc[:2]
    A   B   C
a   1   2   3
b   4   5   6

In addition,ilocandlocYou can also receive a coordinate and queryDataFrameSpecified value or area of:

>> df.loc['b','B']
5
>> df.loc['a':'b',['A','C']]
    A   C
a   1   3
b   4   6

Finally, there are frequently used Boolean indexes:

>> df[[True, False, True]]
    A   B   C
a   1   2   3
c   7   8   9

change

Pandas - basic operation of dataframe

Modify the specified value:

>> df.loc['a', 'A']
1
>> df.loc['a', 'A'] = 1000
>> df
Pandas - basic operation of dataframe

Modify index and column names:

>> df.index = ['aa','bb','cc']
>> df.columns = ['AA','BB','CC']
>> df
Pandas - basic operation of dataframe

increase

Add a line:

>> df.loc['dd'] = [0,0,0]
>> df
Pandas - basic operation of dataframe

Add multiple lines of content (splicing two vertically)DataFrame)First, construct a new dataframedf2

>> df2 = pd.DataFrame(data=[[10,10,10], [100, 100, 100]], 
                      index=['dd', 'ee'], 
                      columns=['AA', 'BB', 'CC'])
>> df2
Pandas - basic operation of dataframe

Splice two dataframes:

>> df3 = pd.concat([df, df2])
>> df3
Pandas - basic operation of dataframe

pd.concatOnly simple splicing is done, and even repeated indexes will not be overwritten:

>> df3.loc['dd']
    AA  BB  CC
dd  0   0   0
dd  10  10  10

Usually, we useignore_index=TrueTo reproduce the digital index:

>> df3 = pd.concat([df, df2], ignore_index=True)
>> df3
Pandas - basic operation of dataframe

bydf2Add a columnDD

>> df2['DD'] = [1000, 1000]
>> df2
Pandas - basic operation of dataframe

What about adding multiple columns? We still use the samepd.concat, but set the parameter toaxis=1。 Let’s construct a dataframe with two rows and two columnsdf4

>> df4 = pd.DataFrame([[1,2],[3,4]], index=['dd','ee'], columns=['E','F'])
>> df4
    E   F
dd  1   2
ee  3   4

Splicingdf2Anddf4

>> df5 = pd.concat([df2,df4], axis=1)
>> df5
Pandas - basic operation of dataframe

Delete

Delete the abovedf5MediumEColumn sumFColumns:

>> del df5['E']
>> del df5['F']
>> df5
Pandas - basic operation of dataframe

When deleting multiple columns, you can also usedropMethod, but specifyaxis=1

>> df5.drop(['CC','DD'], axis=1, inplace=True)
>> df5
Pandas - basic operation of dataframe

You can also usedropMethod to delete multiple rows. The default parameter is used when deleting rowsaxis=0You can:

>> df5.drop(['dd'], inplace=True)
>> df5
Pandas - basic operation of dataframe