Some usages of dataframe

Time:2021-4-18

Some usages of dataframe in pandas

Pandas reading excel file

  • pd.read_excelThe prerequisite is to install the xlrd library

Conversion between dataframe, numpy and list

  • Dataframe to numpy: dataframe object. Values
  • Dataframe to list: dataframe object values.tolist ()
  • List to numpy: np.array (list object)
  • List to dataframe: pd.DataFrame (list object)
  • Numpy to list: numpy object. Tolist ()
  • Numpy turn dataframe:pd.DataFrame (numpy object)

Dataframe traverses by row and by column

  • Traverse by line:

    Commonly used df.iterrows ()

    import pandas as pd 
    demo_list = [[1,2],
                 [3,4]]
    #Building dataframe with list
    demo_df = pd.DataFrame(demo_list)
    print(demo_df)

在这里插入图片描述

#Connect
for row in demo_df.iterrows():
	print(type(row))
    print(row[0])
    print(row[1])

在这里插入图片描述

You can see that the type of each row is tuple, the length of tuple is 2, the 0 th element of tuple is index, and the 1 st element is horizontal series. **It is worth noting that in the traversal process, if you take a certain value of each row, you can slice row [1]. **

  • Traversal by column

    Frequently used df.columns Get the column name and visit

    #Connect
    print(demo_df.columns)
    for column in demo_df.columns:
        print(demo_df[column])

在这里插入图片描述

Using iloc slice in dataframe

  • Build dataframe first
import numpy as np
import pandas as pd
##List to build a 5x5 dataframe, because the dataframe has no reshape, it needs numpy
demo_list = [i for i in range(25)]
demo_np = np.array(demo_list).reshape(5,5)
demo_df = pd.DataFrame(demo_list)
print(demo_df)

在这里插入图片描述

  • iloc[start:end ,start :end ]Indicates that the value of dataframe is taken out by row and column.Where a comma is preceded by a row and a comma is followed by a column. The left side of the colon indicates the beginning and the right side of the colon indicates the end. For example, demo_ df.iloc [2:4,1:3] represents the data from the first column to the second column from the second row to the third row of the slice.The data type returned by the slice is still dataframe.

在这里插入图片描述

  • iloc[start: end :step,start:end :step]It is based on the previous slice and added step size. Indicates that the value is taken every step from start to end.

Handling missing values in dataframe

  • Mean filling

    Fillna ()

    ##Gets the list of column names with missing values
    null_columns=list(file_df.columns[file_df.isnull().sum() > 0])
    for column in null_columns :
        #Calculate the mean value of each column
    	mean_val = file_df[column].mean()
    	#Fill in the mean with fillna
        file_df[column].fillna(mean_val, inplace=True)