I have to tell you the truth: when I studied data science, I totally underestimated the importance of drawing. Yes, it was a mess: I learned Python from scratch, became familiar with all the possible algorithms, understood the math behind everything, but my drawing skills were terrible.
Why? We’re always doing the same thing. You know: pairplots, distlots, qqplots Using charts when you visualize data is the only way to understand data. These are very useful, generic and default charts. So, copying and pasting a bunch of code has become the most common thing I do.
For my project, the deliverable is always a model. Due to hours of data cleaning and Feature Engineering, it’s likely to get a good score. I was the only participant in my project, and my professors knew everything about the data when they gave it to me. Who am I drawing for? Myself? Well No need! Right? I know better than anyone what each step is achieving. I don’t need to explain it to anyone.
I believe that this may be my biggest failure in Data Science: not fully considering the importance of interpretability and interpretability. You may be a genius, but if you can’t explain to a third party how and why you came to these wonderful conclusions, then you may be nothing.
At Ravelin technology, for example, we offer fraud prevention solutions based on machine learning. Imagine you tell a client that you’ve blocked x% It’s just because the machine learning model says that, but you don’t know why. What would happen? Of course, it’s not very attractive for any e-commerce trying to maximize the recycling rate and sales, right? Imagine the same situation in other sensitive areas such as health care That’s a gigabyte of disaster.
Now, in addition to business-related issues, even from a legal point of view, or even from the point of view that your business only cares about predictions – no matter how you get them, it helps to understand how an algorithm actually works. Not only can you better explain the reasons for the output to your customers, but you can better coordinate the work of data scientists and analysts.
Being able to explain your thinking process to people is a key part of any data related work. In this case, it is not enough to copy and paste the chart. The personalization of the chart becomes very important.
In the rest of this article, I’d like to share with you 10 basic intermediate and advanced drawing tools. I find these tools very useful in real life when it comes to drawing and interpreting your data.
The libraries I will refer to in the following lines:
Seaborn：import seaborn as sns matplotlib：matplotlib.pyplot as plt
In addition, if necessary, you can set the style and the format you like, for example:
plt.style.use('fivethirtyEight') %config inlinebackend.figure\format='retina' %matplotlib inline
With that in mind, let’s jump straight to these tools:
Draw the coincidence diagram
Sometimes, you want to plot different things in a chart. But sometimes, you may want to throw different charts in the same row or column to complement each other and / or display different pieces of information.
For this reason, here is a very basic but essential tool: subplots. How to use it? It’s very simple. The chart in Matplotlib is a structure that can be used as follows:
- Graph: the background or canvas on which a chart is drawn
- Axis: our chart
Usually, these things are set automatically in the code background, but if we want to draw multiple graphics, we only need to create the graphics and axis objects as follows:
fig, ax = plt.subplots(ncols=number_of_cols, nrows=number_of_rows, figsize=(x,y)
For example, if you set ncols = 1 and nrows = 2, we will create a graph consisting of the X and Y axes with only two charts distributed in two different rows. The only thing left is to use the ‘ax’ parameter from 0 to specify the order of the different drawings. For example:
sns.scatterplot(x=horizontal_data_1, y=vertical_data_1, ax=ax); sns.scatterplot(x=horizontal_data_2, y=vertical_data_2, ax=ax);
This may not seem necessary or helpful, but you can’t imagine how many times you’ll be asked what the X / Y axis represents if your chart is a bit confusing, or if people who see the data are not familiar with it. Following the two previous drawing examples, if you want to set a specific name for the axis, you must use the following lines of code:
ax.set(x label='My X Label'，ylabel='My Y Label') ax.set(xlabel='My Second X Label'，ylabel='My Second and Very Creative Y Label')
If we’re going to present the data to a third party, another basic but critical point is to use the title, which is very similar to the previous axis marker:
ax.title.set_text(‘This title has to be very clear and explicative’) ax.title.set_text(‘And this title has to explain what’s different in this chart’
Annotate the key elements of the chart
In general, it is not clear to use the scale itself only on the left and right sides of the chart. Marking values on a graph is very useful for interpreting the chart.
Suppose we use subplots now, and we have several charts, one of which is Seaborn’s barplot at ax . In this case, the code for getting comments on each bar in the bar chart is more complex, but it is easy to implement
for p in ax.patches: ax.annotate(“%.2f” % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()), ha=’center’, va=’center’, fontsize=12, color=’white’, xytext=(0, -10), textcoords=’offset points’
For each “patch” or bar chart in the chart, until the “ha” parameter gets the position, height, and width of the bar, so that the value annotation can be placed in the correct position. In a similar way, we can also specify the alignment, font size, and color of the annotation, and the “xytext” parameter indicates whether we want to move the annotation in an X or Y direction. In the example above, we will move the annotation text down the Y axis.
Use different colors to distinguish labels
In some cases, over a period of time or a series of values, we may have measured different kinds of objects. For example, suppose we measure the weight of dogs and cats for six months. At the end of the experiment, we wanted to draw the weight of each animal and distinguish the cat from the dog with blue and red. For this reason, in most traditional drawings, we can use the parameter hue to provide a list of colors for elements.
weight = [5,4,8,2,6,2] month = [‘febrero’,’enero’,’abril’,’junio’,’marzo’,’mayo’] animal_type = [‘dog’,’cat’,’cat’,’dog’,’dog’,’dog’] hue = [‘blue’,’red’,’red’,’blue’,’blue’,’blue’] sns.scatterplot(x=month, y=weight, hue=hue);
Change the size of points in a scatter plot
Using the same example above, we can also use a scale from 1 to 5 to indicate the size of the animals in the chart. A good choice to add this extra indicator to the plot is to modify the size of the scatter plot, assign the size to the new additional vector through the size parameter, and adjust the relationship between them by using the size parameter
size = [2,3,5,1,4,1] sns.scatterplot(x=month, y=weight, hue=hue, size=size, sizes= (50,300));
By the way, if the legend makes the plot more difficult to read as shown in the figure above, you can set the “legend” parameter to false.
Include a row in the data to display the threshold
In many cases in real life, data above or below a certain threshold may be a problem signal or an error warning. If you want to show it clearly in the drawing, you can add a line using the following command:
Where to add it?
Ax  will be the chart in which we want to insert rows
32 will be the value of the sketch line
C =’r ‘means the chart will be red
If we use subplots, it’s easy to add axvline to the corresponding axe, as shown in the example above. However, if you do not use subplots, you should do the following:
g=sns.scatterplot(x=month，y=weight，hue=hue，legend=false) g.axvline(2，c='r') plt.show()
Multi Y-axis drawing
This may be the simplest, but it is also one of the most useful techniques.
Sometimes we just need to add more information to the chart, and there is no other way to bypass it than to add a new measure on the right Y-axis of the plot:
You can now add any chart that you want to point the “ax” parameter to “AX2.”
sns.lineplot(x=month, y=average_animal_weight, ax=ax2
Note that this example again assumes that you are using subplots. If not, you should follow the same logic as the previous point:
g = sns.scatterplot(x=month, y=weight, hue=hue, legend=False) g.axvline(2,c=’r’) ax2 = g.twinx() sns.lineplot(x=month, y=average_animal_weight, ax=ax2, c=’y’) plt.show()
Note that for it to work, you should always set the same data for the x-axis in both charts. Otherwise, they don’t match.
Overlay drawing and changing labels and colors
It’s easy to overlay charts on the same axis: we just need to write code for all the drawings we want, and then we can simply call ‘ plt.show () ‘draw them all together:
a=[1,2,3,4,5] b=[4,5,6,2,2] c=[2,5,6,2,1] sns.lineplot(x=a，y=b，c='r') sns.lineplot(x=a，y=c，c='b') plt.show()
However, sometimes overlap can lead to confusion, so we may need to make some improvements to make it easier for people to understand.
For example, suppose you want to overlap the height distribution of two different samples you collected in the same graph: one from your colleague and the other from your local basketball team. It’s best to add something personal, such as different colors, and add a legend to show which one they represent. Well, simple:
By setting the “color” tag, we can set a specific color for each one. Note that sometimes this parameter can be changed to a simple “C”
With the “label” parameter, we can simply call X. legend() to specify any text to display
g = sns.distplot(workmates_height, color=’b’, label=’Workmates’) sns.distplot(basketball_team, color=’r’, ax=g, label=’Basket team’) g.legend() plt.show()
10. Set the order of axes in the bar chart
Finally, a very special tool ~ if you like to use bar charts, you may face the problem that your bars are not arranged in the order you want them to be. In this case, a simple fix is to pass a list with the specific order you want to the “order” parameter:
a=['second'，'first'，'third'] b=[15,10,20] sns.barplot(x=a，y=b，order=['first'，'second'，'third']);
Drawing itself is a world, and in my experience, the best way to improve your skills is to practice. But I hope these tools and techniques will help you do your real-world work in data science, just as they did for me.