Author: Huan Hao

# Background introduction

In the process of quantitative analysis, we always need to use a large number of data bases to mine the association between data, and finally find the data we need. Data analysis only through Python is very complex. Is there a simpler tool to help us analyze data efficiently and quickly?

Today we will introduce pandas, a powerful tool set for analyzing structured data.

This article is mainly for students who have a certain Python syntax foundation. Students who need to learn Python can find tutorials in the community to recharge（https://developer.hs.net/cour…）。

## Basic concepts

**Pandas**Library is a free and open-source third-party Python library. It is one of the indispensable tools for Python data analysis. It provides Python data analysis with high-performance and easy-to-use data structures, namely series and dataframe.

**Pandas**The use basis is numpy (providing high-performance matrix operation); It is used for data mining and data analysis, and also provides data cleaning function.

**Pandas**The library is based on the python numpy library, so it can be used with Python’s scientific computing library.

**Pandas**Since its birth, it has been applied in many fields, such as finance, statistics, social science, construction engineering and so on.

Through the above introduction, we must have a basic understanding of what pandas does. Pandas is equivalent to excel in Python: it uses tables (that is, dataframe) and can make various transformations on data, but it also has many other functions.

## data structure

### DataFrame

Dataframe is a tabular data structure. It contains a set of ordered columns. Each column can be of different value types (numeric value, string, Boolean value). Dataframe has both row indexes and column indexes. It can be regarded as a dictionary composed of series (using a common index).

The construction method of dataframe is as follows:

`pandas.DataFrame( data, index, columns, dtype, copy)`

Parameter Description:

**data**: a set of data (darray, series, map, lists, dict, etc.).**index**: index value, or can be called row label.**columns**: column label. The default is rangeindex (0, 1, 2,…, n).**dtype**: data type.**copy**: copy data. The default value is false.

### Series

Series is similar to a column in a table, similar to a one-dimensional array, and can hold any data type.

Series consists of an index and a column. The functions are as follows:

`pandas.Series( data, index, dtype, name, copy)`

Parameter Description:

**data**: a set of data (type ndarray).**index**: data index label. If it is not specified, it starts from 0 by default.**dtype**: data type. You can judge by yourself by default.**name**: set the name.**copy**: copy data. The default value is false.

## Get started quickly

#### Introduction component

Introduce pandas components into the code:

`import pandas as pd`

If it cannot be imported, there is a problem with the environment configuration or you haven’t downloaded it at all. Download the components in the following ways:

`pip install Pandas`

#### Series object operations

Create a series object through the series() function, which can call corresponding methods and properties:

```
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
```

#### Dataframe object operation

adopt`DataFrame()`

The syntax format for creating objects is as follows:

```
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
```

#### Read file data

Can pass`read_csv()`

Function on local`.csv`

Format file to read:

```
data = pd.read_csv('file.csv')
data = pd.read_csv('file.csv', nrows=1000, skiprows=[1,5], encoding= gbk)
```

Parameter meaning:

`'file.csv'`

: indicates the read file name, which can be added to the system location for reading`nrows`

: indicates the number of rows of data before reading`skiprows`

: indicates that the number of unread lines will be automatically skipped when reading the file.`encoding`

: indicates the encoding format of the read file

And`read_csv`

, there are similar methods`read_excel`

Read excel file data.

#### Write file data

**Pandas**Provided`to_csv()`

The function is used to`DataFrame`

Convert to`CSV`

data If you want to`CSV`

To write data to a file, just pass a file object to the function. Otherwise,`CSV`

The data will be returned in string format.

`data.to_csv(‘my_new_file.csv’, index=None)`

Parameter meaning:

`index`

: indicates whether an index needs to be added. The index will be added automatically by default

And`to_csv`

, there are similar methods`to_excel`

Write excel file data.

# summary

This article mainly introduces the basic knowledge of pandas toolset. Learning pandas can help us quickly process and analyze data. Practical operations will continue to be updated in the future. Please look forward to it.