Data visualization [from programming to drawing]: 2. Line chart

Time:2020-2-17

Reference source: vitu.ai

You are familiar with the programming environment in the previous article. Next, it’s time to draw a true line chart

Let’s just set it up at the beginning

Set up your notebook

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Print ("setup complete")

Select data set

The dataset used in this article is the daily broadcast times of spotify, a famous streaming music website. We are mainly concerned about the following five pop songs from 2017 to 2018:

1.”Shape of You”, by Ed Sheeran
2.”Despacito”, by Luis Fonzi
3.”Something Just Like This”, by The Chainsmokers and Coldplay
4.”HUMBLE.”, by Kendrick Lamar
5.”Unforgettable”, by French Montana

Click here to download the dataset

Open it in Excel as follows:

Data visualization [from programming to drawing]: 2. Line chart

Here we notice that the first data point is January 6, 2017, which is the first day of the shape of you. You can see that 12287078 times were broadcast on the first day of the world. Other lines are missing values on this day because they haven’t been released yet

Let’s upload the CSV file to VITU’s dataset space

Data visualization [from programming to drawing]: 2. Line chart

Next, we use panda to load this file:

# Path of the file to read
spotify_filepath = "spotify.csv"

# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)

It’s time to check the data

Let’s print the first five lines of the dataset, as in the previous article

# Print the first 5 rows of the data
spotify_data.head()

You can see that the first line is the missing value Nan (not a number), as you can see when you open excel

We can look at the last five lines. This time, just change the head to tail.

# Print the last five rows of the data
spotify_data.tail()

Finally, there is no missing value. We can draw a picture

Here comes the drawing link

The data is loaded into the notebook in the previous cell, so it’s easy for us to draw the diagram in one line of code

# Line chart showing daily global streams of each song 
sns.lineplot(data=spotify_data)

In this line of code

Sns.lineplot tells notebook that I want to draw a line chart. If sns.bartlot is for histogram, sns.heatmap is for thermodynamic chart

Data = spotify? Data means data set. We choose spotify? Data

Sometimes we want to change the details of the drawing, such as the size of the drawing or setting the title, which can be set in this way

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song 
sns.lineplot(data=spotify_data)

Data visualization [from programming to drawing]: 2. Line chart

Drawing data from a subset

So far, we have learned to draw the data of all columns in the dataset. In this section, we will learn how to draw the data of some columns

First let’s look at all the columns:

list(spotify_data.columns)

In the next code, let’s draw the data of the first two columns

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

Data visualization [from programming to drawing]: 2. Line chart

Original address: data visualization [from programming to drawing]: 2. Line chart