Select rows or columns from dataframe in pandas

Time:2021-9-12

Original link:http://bbs.fishc.com/thread-7…

  1. import numpy as np
  2. import pandas as pd
  3. from pandas import Sereis, DataFrame
  4. ser = Series(np.arange(3.))
  5. data = DataFrame(np.arange(16).reshape(4,4),index=list(‘abcd’),columns=list(‘wxyz’))
  6. data[‘w’]   # Select the ‘W’ column in the table, use the class dictionary attribute, and return the series type
  7. data.w     # Select the ‘W’ column in the table, use the point attribute, and return the series type
  8. data[[‘w’]]   # Select the ‘W’ column in the table and return the dataframe property
  9. data[[‘w’,’z’]]   # Select the ‘W’ and ‘Z’ columns in the table
  10. data[0:2]   # Return all lines from line 1 to line 2, front closed and rear open, including front and excluding rear
  11. data[1:2]   # Returns the second row, counted from 0, and returns a single row in the form of an index with front and back values,
  12. #If data [1] is used, an error is reported
  13. Data.ix [1:2] # returns the third method in line 2, which returns dataframe, the same as data [1:2]
  14. data[‘a’:’b’]   # Slice with the index value, and return the dataframe of * * front closed and rear closed * *,
  15. #That is, the end is contained
  16. data.irow(0)    # Take the first row of data
  17. data.icol(0)    # Take the first column of data
  18. data.head()   # Return the first few rows of data. The default is the first five rows. If the first ten rows are required, dta.head (10)
  19. data.tail()   # Return the last few rows of data. The default is the last five rows. If the last ten rows are required, data.tail (10)
  20. ser.iget_ value(0)   # Pick the first in the ser sequence
  21. ser.iget_ Value (- 1) # selects the last one in the ser sequence. This kind of axis index contains the series of indexers. Ser [- 1] cannot be used to get the last one, which will cause ambiguity.
  22. data.iloc[-1]    # Select the last row of dataframe and return series
  23. data.iloc[-1:]    # Select the last row of dataframe and return the dataframe
  24. data.loc[‘a’,[‘w’,’x’]]    # Return ‘a’ row ‘W’ and ‘x’ column, which is used to select row index and column index
  25. data.iat[1,1]    # Select the second row and the second column, which is used to select the positions of known rows and columns.

example:

  1. import pandas as pd
  2. from pandas import Series, DataFrame
  3. import numpy as np
  4. data = DataFrame(np.arange(15).reshape(3,5),index=[‘one’,’two’,’three’],columns=[‘a’,’b’,’c’,’d’,’e’])
  5. data
  6. Out[7]: 
  7.         a   b   c   d   e
  8. one     0   1   2   3   4
  9. two     5   6   7   8   9
  10. three  10  11  12  13  14
  11. There are several ways to operate columns

  12. data.icol(0)    # Select first column
  13. E:\Anaconda2\lib\site-packages\spyder\utils\ipython\start_kernel.py:1: FutureWarning: icol(i) is deprecated. Please use .iloc[:,i]
  14.   # -*- coding: utf-8 -*-
  15. Out[35]: 
  16. one       0
  17. two       5
  18. three    10
  19. Name: a, dtype: int32
  20. data[‘a’]
  21. Out[8]: 
  22. one       0
  23. two       5
  24. three    10
  25. Name: a, dtype: int32
  26. data.a
  27. Out[9]: 
  28. one       0
  29. two       5
  30. three    10
  31. Name: a, dtype: int32
  32. data[[‘a’]]
  33. Out[10]: 
  34.         a
  35. one     0
  36. two     5
  37. three  10
  38. data.ix[:,[0,1,2]]   # Do not know the column name, only know the position of the column
  39. Out[13]: 
  40.         a   b   c
  41. one     0   1   2
  42. two     5   6   7
  43. three  10  11  12
  44. data.ix[1,[0]]   # Select the value in Row 2, column 1
  45. Out[14]: 
  46. a    5
  47. Name: two, dtype: int32
  48. data.ix[[1,2],[0]]    # Select the value in Row 2, 3 and column 1
  49. Out[15]: 
  50.         a
  51. two     5
  52. three  10
  53. data.ix[1:3,[0,2]]   # Select the values in rows 2-4, columns 1 and 3
  54. Out[17]: 
  55.         a   c
  56. two     5   7
  57. three  10  12
  58. data.ix[1:2,2:4]   # Select the values in rows 2-3 and columns 3-5 (excluding 5)
  59. Out[29]: 
  60.      c  d
  61. two  7  8
  62. data.ix[data.a>5,3]
  63. Out[30]: 
  64. three    13
  65. Name: d, dtype: int32
  66. data.ix[data.b>6,3:4]   # It’s a bit awkward to select column 4 in the row greater than 6 in column ‘B’
  67. Out[31]: 
  68.         d
  69. three  13
  70. data.ix[data.a>5,2:4]   # Select columns 3-5 (excluding 5) in the row greater than 5 in column ‘a’
  71. Out[32]: 
  72.         c   d
  73. three  12  13
  74. data.ix[data.a>5,[2,2,2]]   # Select the second column in the row greater than 5 in the ‘a’ column and repeat 3 times
  75. Out[33]: 
  76.         c   c   c
  77. three  12  12  12
  78. You can also mix the number of rows or columns with the row column name

  79. data.ix[1:3,[‘a’,’e’]]
  80. Out[24]: 
  81.         a   e
  82. two     5   9
  83. three  10  14
  84. data.ix[‘one’:’two’,[2,1]]
  85. Out[25]: 
  86.      c  b
  87. one  2  1
  88. two  7  6
  89. data.ix[[‘one’,’three’],[2,2]]
  90. Out[26]: 
  91.         c   c
  92. one     2   2
  93. three  12  12
  94. data.ix[‘one’:’three’,[‘a’,’c’]]
  95. Out[27]: 
  96.         a   c
  97. one     0   2
  98. two     5   7
  99. three  10  12
  100. data.ix[[‘one’,’one’],[‘a’,’e’,’d’,’d’,’d’]]
  101. Out[28]: 
  102.      a  e  d  d  d
  103. one  0  4  3  3  3
  104. one  0  4  3  3  3
  105. The following operations are performed on rows:

  106. data[1:2]   # (when the column index is unknown) select row 2. Data [1] cannot be used, but data. IX [1] can be used
  107. Out[18]: 
  108.      a  b  c  d  e
  109. two  5  6  7  8  9
  110. data.irow(1)    # Select the second row
  111. E:\Anaconda2\lib\site-packages\spyder\utils\ipython\start_kernel.py:1: FutureWarning: irow(i) is deprecated. Please use .iloc[i]
  112.   # -*- coding: utf-8 -*-
  113. Out[36]: 
  114. a    5
  115. b    6
  116. c    7
  117. d    8
  118. e    9
  119. Name: two, dtype: int32
  120. data.ix[1]    # Select line 2
  121. Out[20]: 
  122. a    5
  123. b    6
  124. c    7
  125. d    8
  126. e    9
  127. Name: two, dtype: int32
  128. data[‘one’:’two’]   # When a known row index is used, it is a front closed and back closed interval, which is slightly different from the slice.
  129. Out[22]: 
  130.      a  b  c  d  e
  131. one  0  1  2  3  4
  132. two  5  6  7  8  9
  133. data.ix[1:3]   # Select lines 2 to 4, excluding line 4, i.e. front closed and rear open interval.
  134. Out[23]: 
  135.         a   b   c   d   e
  136. two     5   6   7   8   9
  137. three  10  11  12  13  14
  138. data.ix[-1:]   # Take the last row in the dataframe and return the dataframe type. * * * Note * * this method is conditional. It can be used only when the row index is not a digital index. Otherwise, you can choose ‘data [- 1:] ` – return the dataframe type or’ data.irow (- 1) ` – return the series type
  139. Out[11]: 
  140.         a   b   c   d   e
  141. three  10  11  12  13  14
  142. data[-1:]   # As above, take the last row in the dataframe and return the dataframe type
  143. Out[12]: 
  144.         a   b   c   d   e
  145. three  10  11  12  13  14
  146. Data. IX [- 1] # takes the last row in the dataframe and returns the series type. This is the same. It can be used only when the row index cannot be a number
  147. Out[13]: 
  148. a    10
  149. b    11
  150. c    12
  151. d    13
  152. e    14
  153. Name: three, dtype: int32
  154. data.tail(1)    # Returns the last row in the dataframe
  155. data.head(1)    # Returns the first row in the dataframe

Pd.read was found when processing data recently_ Sometimes unnamed columns are read from CSV () data, and they are not used. It is usually caused by the replacement of index columns. It is hard to see if you have obsessive-compulsive disorder. At this time, dataframe.drop ([columns,]) cannot be processed. What should I do,
The stupidest way is to rename the column index directly:

data6

Unnamed: 0  high    symbol  time
date
2016-11-01  0   3317.4  IF1611  18:10:44.8
2016-11-01  1   3317.4  IF1611  06:01:04.5
2016-11-01  2   3317.4  IF1611  07:46:25.5
2016-11-01  3   3318.4  IF1611  09:30:04.0
2016-11-01  4   3321.8  IF1611  09:31:04.0

data6.columns = list(‘abcd’)

data6

a   b   c   d
date
2016-11-01  0   3317.4  IF1611  18:10:44.8
2016-11-01  1   3317.4  IF1611  06:01:04.5
2016-11-01  2   3317.4  IF1611  07:46:25.5
2016-11-01  3   3318.4  IF1611  09:30:04.0
2016-11-01  4   3321.8  IF1611  09:31:04.0