Chart with Seaborn

Time:2021-7-2

By Jenny dcruz
Compile VK
Source: towards Data Science

Seaborn is a powerful Python library for enhancing data visualization. It provides a number of advanced interfaces for Matplotlib. Seaborn does a good job with data frames, not Matplotlib, which allows you to draw compelling charts in a simpler way.

To better understand this article, you need to know the basics of pandas and Matplotlib. If not, you can refer to the following articles:

  1. Pandas was used for data analysishttps://towardsdatascience.com/pandas-for-data-analysis-142be71f63dc

  2. Use Matplotlib for visualization:https://towardsdatascience.com/visualizations-with-matplotlib-4809394ea223

Ensure that the necessary libraries are installed on the system:

Use CONDA:

conda install pandas
conda install matplotlib
conda install seaborn

Using PIP:

pip install pandas
pip install matplotlib
pip install seaborn

Let’s first import the required Python libraries and datasets.

You can find the CSV file for this tutorial here:https://github.com/jendcruz22/Medium-articles/tree/master/Plotting charts with Seaborn

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.read_csv('Pokemon.csv', index_col = 0, encoding='unicode-escape')
df.head()

In the code above, we’ll use index_ Col is set to 0, which means that we treat the first column as an index.

Using the properties of the Seaborn and Pokemon datasets, we’ll create some very interesting visualizations. The first thing we need to look at is the scatter plot.

Scatter plot

Scatter plots use dots to represent the values of different numerical variables. The position of each point on the horizontal and vertical axes represents the value of a single data point. They are used to observe the relationship between variables.

In Seaborn, we only need to use “lmplot” function to make scatter diagram. To do this, we pass the dataframe to the data parameter, and then pass in the column names of the X and Y axes.

By default, the scatter plot also shows a regression line, which is the most suitable line for the data.

sns.lmplot(x=’Attack’, y=’Defense’, data=df)
plt.show()

Here you can see our scatter plot, which shows the comparison of offensive and defensive scores.

Our regression line basically shows the correlation between the two axes. In this case, it’s tilted up. That is to say, when the offensive score is higher and higher, the defensive score will be higher and higher. To delete the regression line, set the “fitreg” parameter to false.

In addition, we can set hue parameters to color Pokemon during its evolution. This hue parameter is very useful because it allows you to use color to express information in the third dimension.

sns.lmplot(x=’Attack’, y=’Defense’, data=df, fit_reg=False, hue=’Stage’)
plt.show()

The scatter plot looks the same as before, except that there is no regression line in the middle, and the color of each point is different. These colors just show the stage of each sprite. Stage is just another attribute in the data we saw earlier.

From this graph, we can conclude that the Pokemon score in the first stage is usually lower than that in the higher stage.

Box line diagram

Boxplot is one of the important graphs used to display data distribution. In Seaborn, only one line of code is needed to display the boxplot using boxplot function. In this case, we’ll use the entire dataframe with the exception of the total, stage, and legacy attributes.

df_copy = df.drop([‘Total’, ‘Stage’, ‘Legendary’], axis=1)
sns.boxplot(data=df_copy)

Here we can see that each attribute has its own boxplot.

The boxplot is based on five numerical summaries, each of which is displayed in a different row. The middle line is the median, the center of the data. The bottom and top lines at the end of the boxplot are the median of quartiles 1 and 4, which basically show the minimum and maximum values of the distribution. The other two lines in the middle are the median of quartile 2 and 3, which show the difference between the value and the median. Single points beyond this range represent outliers in the data.

Violin picture

Violin diagram is similar to box diagram. Violin chart is a very useful substitute for box chart. They show the distribution by the thickness of the violin, not just by simple statistics. As we all know, violin graph is very convenient for analyzing and visualizing the distribution of different attributes in data set.

In this case, we will use the same data frame copy from the previous example.

sns.violinplot(data=df_copy)
plt.show()

We can observe the value distribution of each attribute of the Pokemon. A thicker area of the violin means a higher density of values. The middle part of the violin diagram is usually thicker, which means that the value density there is very high. We compared the next Pokemon attack type. To do this, let’s use the same violin drawing method.

plt.figure(figsize=(10,6))\sns.violinplot(x='Type 1', y='Attack', data=df)
plt.show()

This graph shows the attack score distribution of each Pokemon’s main types. As you can see, the Dragon Pokemon has the highest attack score, but they also have a high variance, which means that their attack score is also very low“ The variance of the “ghost” main types is very low, which means that most of their data values are concentrated in the center.

Heat map

Heat maps help you visualize matrix type data. For example, we can visualize all the associations between different attributes of a Pokemon.

Let’s calculate the correlation of data frames by calling the “corr” function, and use the “Heatmap” function to draw the heat map.

corr = df_copy.corr()
sns.heatmap(corr)

The heat map above shows the correlation of our data frames.

The lighter the color of the box, the more relevant the two properties are. For example, the correlation between health and the overall speed of the Pokemon is very low. So the color of the box is dark. The correlation between HP and defense speed is very high, so we can see a red square in the heat map. We can see that when one attribute becomes higher, other attributes will also become higher, such as defensive speed.

histogram

Histograms allow you to plot the distribution of values. If we are going to use Matplotlib to create the histogram, it will take more work than using Seaborn to create the histogram. For Seaborn, you need only one line of code to create it.

For example, we can create a histogram to plot the distribution with attack attributes.

sns.distplot(df.Attack, color=’blue’)

We can see that most Pokemon are between 50 and 100. We can see that there are much fewer Pokemon with attack value greater than 100 or less than 50.

Calplots

Like bar charts, calplots allow you to visualize the distribution of variables in each category. We can use calplot to see how many Pokemon are in each major type.

sns.countplot(x=’Type 1', data=df)
plt.xticks(rotation=-45)

We can see that “water” has the most Pokemon, while “Fairy” and “ice” have the least.

Density map

The density map shows the distribution between the two variables. For example, we can use density maps to compare two attributes of Pokemon: attack value and defense value. We will use the ‘jointplot’ function to do this.

sns.jointplot(df.Attack, df.Defense, kind=’kde’, color=’lightblue’)

“KDE” means we need a density graph.

As you can see, the drawing area changes in the dark depending on the number of values in the area. Dark areas indicate a very strong relationship. From this figure, we can see that when the attack value is between 50 and 75, the defense value is about 50.

About this article. I hope you like to visualize data with Seaborn.

You can find the code and data set of this article here:https://github.com/jendcruz22/Medium-articles/tree/master/Plotting charts with Seaborn

Thank you for reading!

Reference

[1] Seaborn document:https://seaborn.pydata.org/

Link to the original text:https://towardsdatascience.com/plotting-charts-with-seaborn-e843c7de2287

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/