8k Star! Data visualization tool based on Matplotlib

Time:2021-9-26

[introduction]: Seaborn is a python library that can realize data visualization. It is encapsulated based on Matplotlib library and compatible with pandas data structure. We can use Seaborn to make beautiful data charts, which is simple and easy to use.

Tip: Seaborn supports Python 3.7 +, and python 2 is no longer supported.

brief introduction

1. Introduction to data visualization tools

Data visualization is a technology that data scientists convert raw data into charts, which can produce a lot of valuable information. Charts reduce the complexity of raw data and make it easier for users to understand.

There are many tools for data visualization, such as tableau, power Bi, chartblocks and other codeless tools. These tools have their own users and are powerful. However, when we need a good platform to process raw data, Python is a good choice.

Although this method is more complex and requires more programming knowledge, python can complete data visualization through many operations and transformations, so it is an ideal choice for data scientists. One of the biggest advantages of Python is that it has powerful third-party libraries to process data, such as numpy, pandas, Matplotlib and tensorflow.

Matplotlib is probably the most recognized drawing library at present. It is suitable not only for Python, but also forR languageWait. Its customization and operability make it the top. However, when using Matplotlib, some customization and operation functions are difficult to achieve.

Based on Matplotlib, developers created a library called Seaborn. Seaborn is as powerful as Matplotlib. It can not only bring some new features, but also simplify drawing.

In this article, we focus on how to use Seaborn to draw advanced charts. You can create your own chart based on these examples.

###2. What is Seaborn?

8k Star! Data visualization tool based on Matplotlib

Seaborn is a library in python that can make data charts. It is a high-level package of Matplotlib library and compatible with pandas data structure.

Seaborn allows you to quickly explore and understand data. Seaborn works by first capturing the entire data structure or array containing all data, and then converting the data into an information graph by executing all internal functions required for drawing and statistical data.

Seaborn can reduce complexity when you design charts according to your needs.

Seaborn’s GitHub home page:
https://github.com/mwaskom/se…

8k Star! Data visualization tool based on Matplotlib

install

pip install seaborn

When Seaborn is installed, other libraries required for drawing, such as Matplotlib, pandas, numpy and SciPy, will also be installed automatically. In addition, we need to import some modules before writing code drawings.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

Simple use

###1. Draw your first chart

Due to network problems, when using Seaborn dataset in China, pay attention to enable the agent to avoid failing to load the dataset.

Before we start drawing, we need to use data. Seaborn is convenient because it is compatible with pandas data structure. In addition, the library comes with
For some built-in data sets, you can load them directly with code without downloading files manually. Let’s see how to load a dataset containing flight information:

flights_data = sns.load_dataset("flights")
res = flights_data.head()
print(res)

The output results are as follows:

  year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121

When calling load_ After the dataset function and writing the name of the dataset, something magical happens and the console returns a data structure.
All data sets are visible here:

GitHub link:

https://github.com/mwaskom/se…

2. Scatter diagram

A scatter chart is a chart that displays points based on two-dimensional data. Drawing a scatter chart with Seaborn library only needs a few lines of code, which is very simple. The parameters required by scatterplot are the data set we need for drawing, andx,yWhat data does the axis represent.

flights_data = sns.load_dataset("flights")
sns.scatterplot(data=flights_data, x="year", y="passengers")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

3. Line drawing

Draw a line diagram according to the changes of continuous or classified data. It is a popular and well-known chart, which is easy to draw. Similar to before, we use the lineplot function,
Specify the data set and which column of data is represented by the X and Y axes respectively. Seaborn will complete the remaining work:

flights_data = sns.load_dataset("flights")
sns.lineplot(data=flights_data, x="year", y="passengers")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

4. Bar chart

As you guessed, bar chart is probably the most famous chart type. Like scatter charts and lines, we can draw bar charts with the barplot function:

flights_data = sns.load_dataset("flights")
sns.barplot(data=flights_data, x="year", y="passengers")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

5. Extend with Matplotlib

Seaborn is built on Matplotlib, which extends its functions and increases its complexity. As stated, there is no limit to the performance of Matplotlib. Any Seaborn chart can be drawn with the function of Matplotlib. Seaborn can help in specific operations, allowing Seaborn to take advantage of the power of Matplotlib without rewriting functions. For example, if you want to use Seaborn to automatically draw multiple charts, you can use the subplot function in Matplotlib:

diamonds_data = sns.load_dataset('diamonds')
plt.subplot(1, 2, 1)
sns.countplot(x='carat', data=diamonds_data)
plt.subplot(1, 2, 2)
sns.countplot(x='depth', data=diamonds_data)
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

Using the subplot function, we can draw multiple charts on one chart. The function has three parameters, the first is the number of rows, the second is the number of columns, and the last is the figure number. We use the functions in Matplotlib and Seaborn. Seaborn draws a Seaborn chart in each subplot.

6. Draw beautiful graphics of different styles

Seaborn allows us to change the graphical interface. It provides five different styles: darkgrid, whitegrid, dark, white, and ticks

The first example is a dark grid:

flights_data = sns.load_dataset("flights")
sns.set_style("darkgrid")
sns.lineplot(data = flights_data, x = "year", y = "passengers")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

Another example is the white grid:

flights_data = sns.load_dataset("flights")
sns.set_style("whitegrid")
sns.lineplot(data=flights_data, x="year", y="passengers")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

Cool usage

. download tip dataset

Now that we know the basics of Seaborn, let’s practice by building multiple charts on the same dataset. In our example, we will
Using the dataset “tips”, you can download it directly using Seaborn. First, load the dataset:

tips_df = sns.load_dataset('tips')
res = tips_df.head()
print(res)

The output results are as follows:

  total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Male No Sun Dinner 4

I want to print out the first few rows of the dataset to understand the columns and the data itself. I usually use some pandas functions to fix some data problems, such as null values, and
To add some useful information to the dataset. You can read more at the following link:

Pandas usage guide:

https://livecodestream.dev/po…

Let’s create a new column in the dataset to represent the percentage of tips in the total fee:

tips_df = sns.load_dataset('tips')
tips_df.head()
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
res = tips_df.head()
print(res)

The new data structure is as follows:

  total_bill tip sex smoker day time size tip_percentage
0 16.99 1.01 Female No Sun Dinner 2 0.059447
1 10.34 1.66 Male No Sun Dinner 3 0.160542
2 21.01 3.50 Male No Sun Dinner 3 0.166587
3 23.68 3.31 Male No Sun Dinner 2 0.139780
4 24.59 3.61 Male No Sun Dinner 4 0.146808

2. Understand tip_ percentage

Let’s first look at tip_ Distribution of percentage. In view of this, use hisplot to generate a histogram:

tips_df = sns.load_dataset('tips')
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
sns.histplot(tips_df["tip_percentage"], binwidth=0.05)
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

We must customize the binwidth attribute to make it more readable. Now we can quickly understand the data. Most customers will tip 15% to 20%, and in some cases, the tip is more than 70%. These values are abnormal and should be checked to determine if they are wrong.

3. Observe tip_ It will also be interesting whether the percentage is related to different times of the day:

tips_df = sns.load_dataset('tips')
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
sns.histplot(data=tips_df, x="tip_percentage", binwidth=0.05, hue="time")
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

This time, we load all data sets into the chart, not just one column, and then set the hue attribute to the time column. This will cause the chart to set a different color for each time value and add a legend to it.

4. Tips for one day of the week

Another interesting measure is to know the total amount of tips you can get based on one day of the week:

tips_df = sns.load_dataset('tips')
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
sns.barplot(data=tips_df, x="day", y="tip", estimator=np.sum)
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

It seems that Friday is a good time to stay at home.

5. Effect of table size and date on tip

Sometimes we want to know how multiple variables affect the output together. For example, the day of the week and the table ruler
How do inches affect the tip percentage together? In order to draw the final chart, we first preprocess the data with the pivot function in pandas, and then draw a hot spot chart:

tips_df = sns.load_dataset('tips')
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"]
pivot = tips_df.pivot_table(
    index=["day"],
    columns=["size"],
    values="tip_percentage",
    aggfunc=np.average)
sns.heatmap(pivot)
plt.show()

The chart drawn is as follows:

8k Star! Data visualization tool based on Matplotlib

conclusion

Of course, we can also do many things with Seaborn. You can see more examples by looking at the official documents. Thank you for reading!

Official document address:http://seaborn.pydata.org/

Open source outpostShare popular, interesting and practical open source projects on a daily basis. Participate in maintaining the open source technology resource library of 100000 + star, including python, Java, C / C + +, go, JS, CSS, node.js, PHP,. Net, etc.