• Example code for pandas implementation by line selection


    catalogue 1. User defined row index 2. Select data by common index 2.1 select single line data by common index 2.2 select multiple rows of data by row index 3. Select data by location index 3.2 select multiple rows of data by location index 4. Select continuous multiple lines of data 5. Select a line […]

  • Implementation of pandas sparse data structure


    catalogue brief introduction Example of spare data SparseArray SparseDtype Sparse properties Calculation of spark Sparseseries and sparsedataframe brief introduction If there are many Nan values in the data, storage will waste space. To solve this problem, pandas introduces a structure called sparse data to effectively store the values of these Nan. Example of spare data […]

  • Pandas custom option settings


    catalogue brief introduction Common options Get / set options Frequently used options Maximum number of display lines Beyond data presentation Maximum column width Display accuracy Threshold of zero conversion Alignment direction of column head brief introduction Pandas has an option system to control the display of pandas. Generally speaking, we don’t need to modify it, […]

  • Implementation example of pandas numerical sorting


    catalogue 1. Sort by a column of values 1.1 sort by a column of five missing values 1.1.1 ascending order 1.1.2 descending order 1.2 sort according to the column with missing value 1.2.1 missing values are displayed at the end 1.2.2 missing values are displayed at the front 2. Sort by multiple column values The […]

  • Implementation of pandas for Excel processing


    This paper mainly introduces the implementation of pandas for Excel processing, which is shared with you as follows: read file import padas Numerical processing DF [“dog”] = DF [“dog”]. Replace (- 1,0) # value replacement get data Data = DF. Head() # default read line Detailed explanation of LOC and iloc LOC [row, column] first […]

  • Implementation of data filtering in pandas


    Compiled by Amanda Iglesias Moreno VK source towards Data Science Filtering data from data frames is one of the most common operations when cleaning up data.PandasProvides a series of methods for selecting data based on row and column positions and labels. In addition, pandas allows you to get a subset of data according to the […]

  • Example code for pandas to implement aggregation operation agg()


    catalogue preface 1. Create dataframe object 2. Single column polymerization 3. Multi column aggregation 4. Multiple aggregation operations 5. Perform multiple aggregation operations and change column names 6. Different columns use different aggregation functions 7. Use custom aggregate function 8. Convenient descibe preface In data analysis, grouping and aggregation are indispensable. Aggregation of data (summation, […]

  • Python using pandas to process Excel data


    Recently, I have been fascinated by pandas, which processes data efficiently. In fact, this is used for data analysis. If you do big data analysis and testing, this is very useful!! But in fact, when we do automatic testing, if it involves data reading and storage, then using pandas will be very efficient. Basically, three […]

  • Several ways to deal with Cartesian product in pyodps dataframe


    Pyodps provides the dataframe API for large-scale data analysis and preprocessing with pandas like interfaces. This paper mainly introduces how to use pyodps to perform Cartesian product operations. The most common scene of Cartesian product is the comparison or operation between two pairs. Taking the calculation of geographical location distance as an example, suppose that […]

  • Pandas Summary Basis for Introduction to Python Data Analysis (I)


    I. Series Series: Pandas’long gun (a column or row in a data table, observation vectors, one-dimensional arrays…) Series1 = pd.Series(np.random.randn(4)) print Series1,type(Series1) print Series1.index print Series1.values Output results: 0 -0.676256 1 0.533014 2 -0.935212 3 -0.940822 dtype: float64 <class ‘pandas.core.series.Series’> Int64Index([0, 1, 2, 3], dtype=’int64′) [-0.67625578 0.53301431 -0.93521212 -0.94082195] Np. random. randn () normal distribution […]

  • Pandas Summary Basis for Introduction to Python Data Analysis (2)


    1. Panda World Comes and goes freely: Pandas I/O It’s a cliche. Basically, we still care about how pandas interact with external data. 1.1 Structured Data Input and Output Read_csv and to_csv are tools for input and output. Read_csv returns to pandas. DataFrame directly, and to_csv writes files as long as it executes commands. Read_table: […]

  • ApacheCN Learning Resources Summary 2019.1


    [Home page] apachecn.org 【Github】@ApacheCN Temporary offline: community Temporary offline: cwiki knowledge base Self-media Platform Weibo: @ApacheCN Know: @ApacheCN CSDN Brief book OSChina Blog Garden We are not Apache’s official organization / organization / group, but a fan of Apache technology stack (and AI)! Collaborative infringement, please contact [fonttian] <[email protected]> | and send a copy to […]