We all know that Python can occupy a place in the field of data science, mainly due to the three swordsmen of data analysis: numpy, pandas and Matplotlib. Among the three libraries, I think pandas is the core and most used. Whether dealing with data or playing games, it is required to be able to skillfully apply pandas. Based on this, the author participated in the pandas study organized by datawhale open source community. The goal is to explode the liver for one month and be proficient in pandas!
From the beginning of this issue, we began to systematically learn and sort out the knowledge of pandas. We will proceed in 10 phases according to the following outline. I believe that through 10 periods of rich case study, mastering and skillfully using pandas will come naturally.
 Pandas Foundation
 Indexes
 grouping
 deformation
 connect
 missing data
 Text data
 Classified data
 Time series data
 Comprehensive exercise
As the first issue, this issue is mainly to be familiar with some basic knowledge and prepare for later learning. It mainly includes some common functions in Python and some operations of numpy library.
1.1 list derivation
List derivation is a major feature of Python language, which can create lists quickly and concisely.
1.1.1 basic format:
[* for I in k]: * can be a function, the variable is I (or independent of I), and K is an iterative object, such as a list.
Application: 1. A sentence of code outputs a cube of 1 to 5
 One sentence of code creates a list containing 10 random integers of 60100
#A sentence of code outputs a cube of 1 to 5
[i**3 for i in range(1,6)]
>>>[1, 8, 27, 64, 125]
#One sentence of code creates a list containing 10 random integers of 60100 (simulating student grades)
import random
[random.randint(60,100) for _ in range(10)]
>>> [76, 89, 62, 83, 61, 80, 89, 99, 76, 78]
1.1.2 for loop nesting
The for loop in the list derivation supports nesting.
For example, there are three lists that save the customer’s name, clothing color and size respectively, and output the combination of all customers and clothing color and size with one sentence code
names = ['zhangsan', 'lisi', 'wangba']
color = ['red', 'yellow']
size = ['S', 'M', 'L']
[name + '' + c + '' + s for name in names for c in color for s in size]
>>>
['zhangsanredS',
'zhangsanredM',
'zhangsanredL',
'zhangsanyellowS',
'zhangsanyellowM',
'zhangsanyellowL',
'lisiredS',
'lisiredM',
'lisiredL',
'lisiyellowS',
'lisiyellowM',
'lisiyellowL',
'wangbaredS',
'wangbaredM',
'wangbaredL',
'wangbayellowS',
'wangbayellowM',
'wangbayellowL']
The above code is equivalent to:
for name in names:
for c in color:
for s in size:
print(name + '' + c + '' + 's')
>>>
zhangsanreds
zhangsanreds
zhangsanreds
zhangsanyellows
zhangsanyellows
zhangsanyellows
lisireds
lisireds
lisireds
lisiyellows
lisiyellows
lisiyellows
wangbareds
wangbareds
wangbareds
wangbayellows
wangbayellows
wangbayellows
1.1.3 filtering function
If (or if… Else…) can also be added after the for loop in the list derivation for filtering.
For example, a sentence of code outputs an integer that can be divided by 7 within 0100
#Output the number that can be divided by 7 in 1100:
[i for i in range(1,101) if i%7 == 0]
>>>
[7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98]
Based on the above cases, we can clearly see the simplicity and elegance of list derivation! It also reflects the power of Python.
1.2 lambda anonymous function
We all know that functions belong to firstclass citizens in the python world and have high permissions. For code blocks that often need to be reused, it is generally preferred to implement them through functions. But when we want to use a function that is simply defined or only needs to be called once or twice, it is redundant to name and write a complete function block. At this time, lambda anonymous functions have a place to play.
Format: lambda [arg1 [, arg2,… Argn]]: expression
Lamdbda here is the system reserved keyword, [arg1 [, arg2,… Argn]] is the parameter list, and its structure is the same as that of the function parameter list in Python. Expression is an expression about parameters. The parameter appearing in the expression needs to be in argument_ List is defined, and the expression can only be single line.
Example: for example, we define a function to output all letters in the string in uppercase
def str_capital(s):
return str.upper(s)
str_capital('datawhale')
>>>
'DATAWHALE'
If you use anonymous function instead:
upper = lambda x: str.upper(x)
upper('datawhale')
>>>
'DATAWHALE'
By comparison, we can see that anonymous functions have the following advantages:
 It can be defined directly where it is used. If it needs to be modified, you can find the modification directly to facilitate future code maintenance
 The syntax structure is simple. You don’t need to use def function name (parameter name): it can be defined in this way. You can directly use lambda parameter: return value definition
However, it should be noted that the anonymous lambda function makes the program concise, but it does not make the program efficient. This is also the reason why many programmers oppose the use of lambda.
1.3 map () method
In Python, the anonymous function lambda is often used in conjunction with map (), reduce () and filter () three builtin functions applied to sequences to traverse, recursively calculate and filter sequences. Among them, the most commonly used is the map method. In Python, the essence of the map () function is a mapping, that is, a defined mapping is performed on each element in the iteratable object (list) input therein. For example, we write a function that outputs a given string in uppercase. When using this function, we output several strings in uppercase
def str_capital(s):
return str.upper(s)
L1 = ['I', 'like', 'Datawhale']
L2 = []
for s in L1:
L2.append(str_capital(s))
L2
>>>
['I', 'LIKE', 'DATAWHALE']
If we replace the for loop with map ():
L3 = map(str_capital, L1)
list(L3)
>>>
['I', 'LIKE', 'DATAWHALE']
You can see more concise! It should be noted that the map () method returns a map () object, and the list () method needs to be used to output the elements in it. As mentioned above, map is often used in combination with lambda anonymous functions, as follows:
L4 = map(lambda x: str.upper(x), L1)
list(L4)
>>>
['I', 'LIKE', 'DATAWHALE']
Elegant!
1.4 zip method
We all know that zip is a file decompression program. Similarly, in Python, the zip () function is a bit similar to the feeling of decompressing a package: pass in a list or other iteratable objects, and then select one from them to form a new tuple output. The following examples:
a = [3,4,5,6]
b = ['a', 'b', 'c']
s1 = {'zhangsan': 20, 'lisi': 25}
print(zip(a))
print('*' * 10)
print(list(zip(a)))
print(list(zip(b)))
print(list(zip(s1)))
>>>
<zip object at 0x000001A7D4FF7940>
**********
[(3,), (4,), (5,), (6,)]
[('a',), ('b',), ('c',)]
[('zhangsan',), ('lisi',)]
You can see that the output of zip is also a zip object. You need to use list to view the elements in it.
When the zip () function has two parameters, such as zip (a, b), take one element from a and B respectively to form tuples, and then combine the tuples into a new iterator. For example:
print(list(zip(a,b)))
>>>
[(3, 'a'), (4, 'b'), (5, 'c')]
This design has a special purpose for addition, subtraction and point multiplication of matrices (twodimensional arrays). Examples are as follows:
import numpy as np
m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
n = [[2, 2, 2], [3, 3, 3], [4, 4, 4]]
#Matrix point multiplication
Print ('= *' * 10 + "matrix point multiplication" + '= *' * 10)
print(np.array([x*y for a, b in zip(m, n) for x, y in zip(a, b)]).reshape(3,3))
#Matrix addition and subtraction are the same
Print ('= *' * 10 + "matrix addition and subtraction" + '= *' * 10)
print(np.array([x+y for a, b in zip(m, n) for x, y in zip(a, b)]).reshape(3,3))
>>>
=*=*=*=*=*=*=*=*=*=*Matrix point multiplication=*=*=*=*=*=*=*=*=*=*
[[ 2 4 6]
[12 15 18]
[28 32 36]]
=*=*=*=*=*=*=*=*=*=*Matrix addition and subtraction=*=*=*=*=*=*=*=*=*=*
[[ 3 4 5]
[ 7 8 9]
[11 12 13]]
Knowledge link: matrix point multiplication
Matrix point multiplication: the corresponding elements are multiplied. It is required that the shapes of the two matrices must be the same. This should be distinguished from matrix cross multiplication.
2. Numpy review
Pandas is based on numpy to achieve efficient computing, so it is necessary to review the knowledge of numpy before learning pandas. Here are some common knowledge points of numpy
2.1 np.array
The most basic data structure in NP is array. The structure is also very simple. NP. Array can be used. Several special arrays are summarized below
 Arithmetic Sequence
 NP. Linspace (start, end (including), number of samples): it is applicable to knowing how many samples need to be created in advance
 NP. Range (start, end (not included), step size): applicable to the case where the adjacent interval is known in advance
be carefulDon’t confuse the range in np.range and python arrays. Range can only generate integer series, while np.range can generate decimal series
import numpy as np
a = np.linspace(1,100,10)
b = np.arange(1,10,1.5)
print(a)
print(b)
>>>
[ 1. 12. 23. 34. 45. 56. 67. 78. 89. 100.]
[1. 2.5 4. 5.5 7. 8.5]

 Special matrices, including zeros / ones / eye / full, etc
Direct code reference:
 Special matrices, including zeros / ones / eye / full, etc
Print ('All 0 matrix with 3 rows and 4 columns')
print(np.zeros((3,4)))
print('*' * 10)
Print ('full 1 matrix with 3 rows and 3 columns')
print(np.ones((3, 3)))
print('*' * 10)
Print ('identity matrix of 3 rows and 3 columns')
print(np.eye(3))
print('*' * 10)
Print ('numeric / fill matrix of specified dimension ')
print(np.full((2,3), 6))
>>>
All 0 matrix with 3 rows and 4 columns
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
**********
Full 1 matrix with 3 rows and 3 columns
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
**********
Identity matrix with 3 rows and 3 columns
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
**********
指定维度的/数值填充矩阵
[[6 6 6]
[6 6 6]]

 随机矩阵
 np.random.rand() ： 取值01之间的随机分布，这里不要传元组，直接指定不同维度的个数即可
 np.random.randn()： 0~1标准正态分布
 np.random.randint(low,high,size) ：指定生成随机整数的最小值最大值和维度大小
 np.random.choice()： 可以从给定的列表中，以一定概率和方式抽取结果，当不指定概率时为均匀采样，默认抽取方式为有放回抽样
 np.random.seed(0) ： 设置种子，就相当是设定了随机值，之后每次随机都一样
2. 练习题：

使用列表推导式完成矩阵乘法：
矩阵乘法定义：image.png一般的矩阵乘法根据公式，可以由三重循环写出：
image.png
使用列表推导式来替代for循环完成
# 先定义零个矩阵
M1 = np.random.randint(1,10,10).reshape(2,5)
M2 = np.random.randint(1,10,10).reshape(5,2)
print(M1)
print('' * 5)
print(M2)
[email protected] # 矩阵乘法
>>>
[[6 1 2 8 5]
[6 1 7 9 4]]

[[6 2]
[7 7]
[1 4]
[7 1]
[8 3]]
array([[141, 50],
[145, 68]])
# 使用列表推导式来完成
[[sum([M1[i][k] * M2[k][j] for k in range(M1.shape[1])]) for j in range(M2.shape[1])] for i in range(M1.shape[0])]
>>>
[[141, 50], [145, 68]]

更新矩阵
设矩阵 Am×n ，现在对 A 中的每一个元素进行更新生成矩阵 B ，更新方法是image.png
例如下面的矩阵为 A ，则 =5×(1/4+1/5+1/6)=37/12 ，请利用 Numpy 高效实现。
解答：
A = np.arange(1,10).reshape(3,3)
B = A*(1/A).sum(1).reshape(1,1)
使用内置的函数
B = A.sum(0) * A.sum(1).reshape(1,1) / A.sum()
print(B)
res = ((AB) ** 2 / B).sum()
print(res)
参考：开源内容Joyful Pandas, 作者 DataWhale耿远昊
另外，更多精彩内容也可以微信搜索，并关注公众号：‘Python数据科学家之路“ ，期待您的到来和我交流