# Implementation of pandas sparse data structure

Time：2021-10-15
##### catalogue
• brief introduction
• Example of spare data
• SparseArray
• SparseDtype
• Sparse properties
• Calculation of spark
• Sparseseries and sparsedataframe

## brief introduction

If there are many Nan values in the data, storage will waste space. To solve this problem, pandas introduces a structure called sparse data to effectively store the values of these Nan.

## Example of spare data

We create an array, set most of its data to Nan, and then use this array to create SparseArray:

``````
In [1]: arr = np.random.randn(10)

In [2]: arr[2:-2] = np.nan

In [3]: ts = pd.Series(pd.arrays.SparseArray(arr))

In [4]: ts
Out[4]:
0    0.469112
1   -0.282863
2         NaN
3         NaN
4         NaN
5         NaN
6         NaN
7         NaN
8   -0.861849
9   -2.104569
dtype: Sparse[float64, nan]

``````

The dtype type here is sparse [float64, Nan], which means that the Nan in the array is not actually stored, only non Nan data is stored, and the type of these data is float64

## SparseArray

Arrays.sparsearray is a   ExtensionArray   ， Used to store sparse array types.

``````
In [13]: arr = np.random.randn(10)

In [14]: arr[2:5] = np.nan

In [15]: arr[7:8] = np.nan

In [16]: sparr = pd.arrays.SparseArray(arr)

In [17]: sparr
Out[17]:
[-1.9556635297215477, -1.6588664275960427, nan, nan, nan, 1.1589328886422277, 0.14529711373305043, nan, 0.6060271905134522, 1.3342113401317768]
Fill: nan
IntIndex
Indices: array([0, 1, 5, 6, 8, 9], dtype=int32)

``````

Use numpy. Asarray()   You can convert it to a normal array:

``````
In [18]: np.asarray(sparr)
Out[18]:
array([-1.9557, -1.6589,     nan,     nan,     nan,  1.1589,  0.1453,
nan,  0.606 ,  1.3342])
``````

## SparseDtype

Sparsedtype represents the spare type. It contains two kinds of information. The first is the data type of non Nan value, and the second is the constant value during filling, such as Nan:

``````
In [19]: sparr.dtype
Out[19]: Sparse[float64, nan]
``````

A sparsedtype can be constructed as follows:

``````
In [20]: pd.SparseDtype(np.dtype('datetime64[ns]'))
Out[20]: Sparse[datetime64[ns], NaT]
``````

You can specify values for padding:

``````
In [21]: pd.SparseDtype(np.dtype('datetime64[ns]'),
....:                fill_value=pd.Timestamp('2017-01-01'))
....:
Out[21]: Sparse[datetime64[ns], Timestamp('2017-01-01 00:00:00')]
``````

## Sparse properties

You can access spark through. Spark:

``````
In [23]: s = pd.Series([0, 0, 1, 2], dtype="Sparse[int]")

In [24]: s.sparse.density
Out[24]: 0.5

In [25]: s.sparse.fill_value
Out[25]: 0

``````

## Calculation of spark

NP calculation function can be directly used in sparsearay and will return a sparsearay.

``````
In [26]: arr = pd.arrays.SparseArray([1., np.nan, np.nan, -2., np.nan])

In [27]: np.abs(arr)
Out[27]:
[1.0, nan, nan, 2.0, nan]
Fill: nan
IntIndex
Indices: array([0, 3], dtype=int32)

``````

## Sparseseries and sparsedataframe

Sparseseries and sparsedataframe were removed in version 1.0.0. They are replaced by the more powerful SparseArray.
Here are the differences in the use of the two:

``````
# Previous way
>>> pd.SparseDataFrame({"A": [0, 1]})
``````
``````
# New way
In [31]: pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
Out[31]:
A
0  0
1  1

``````

If it is a spark matrix in SciPy, you can use dataframe.spark.from_ spmatrix() ：

``````
# Previous way
>>> from scipy import sparse
>>> mat = sparse.eye(3)
>>> df = pd.SparseDataFrame(mat, columns=['A', 'B', 'C'])
``````
``````
# New way
In [32]: from scipy import sparse

In [33]: mat = sparse.eye(3)

In [34]: df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])

In [35]: df.dtypes
Out[35]:
A    Sparse[float64, 0]
B    Sparse[float64, 0]
C    Sparse[float64, 0]
dtype: object

``````

This is the end of this article on the implementation of pandas sparse data structure. For more information about pandas sparse data structure, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!

## [hematemesis finishing] Super complete golang interview questions collection + golang Learning Guide + golang knowledge map + growth route

The brain map is constantly updated. Check the address onlineSubsequent articles and contents will be updated toGitHub projectWelcome to pay attention. Directory (Ctrl + F) Basic introduction Novice 50 mistakes that golang novices often make data type I don’t even know that nil slice is different from empty slice? Then the bat interviewer has to […]