Using Python to imitate a data visualization diagram of R language

Time:2021-7-20

The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.

The following article comes from Python big data analysis, written by fevrey

Using Python to imitate a data visualization diagram of R language

 

brief introduction

To get to the point, the data visualization works we are going to imitate today come from  「# TidyTuesday」   Among the many entries under the “San Francisco street trees dataset” released on January 28, 2020, the one created by Philippe massicotte (as shown in Figure 1) is very popular  「 Street trees of San Francisco」:

Using Python to imitate a data visualization diagram of R language

Figure 1

The tool used by the original author is R language. In today’s article, I will take you to learn how to imitate the style of Figure 1 in Python to visually display similar data information (in fact, the original work has some confusing flaws, so I use different analysis methods from the original author in some places below, Therefore, there are some differences between the final product and the original work.

Imitation process

The picture we are going to imitate today seems a little complicated at first sight, but if you have ever read my series of articles on “spatial data analysis based on geopandas”, you can immediately decompose the composition of this picture in your mind

Process decomposition

When we carefully observe the original work, we can see that the main visual element is to map the statistical value to the color of each community surface, and the outline of the periphery is obviously the outward buffer of the whole area, supplemented by the road network, which makes the whole picture look very “precise”.

Combined with the data we have in hand: San Francisco community “face” data, registered street tree “point” data, as for road network “line” data, we can use the third-party library osmnx to obtain it (it is recommended to use CONDA install – C CONDA forge osmnx for installation).

Split the process into the following steps:

“Data preparation”

First, we need to read in the existing data and vectorize them

Using Python to imitate a data visualization diagram of R language

 

And we can use osmnx to obtain the road network data online. We only need to transfer the bbox range of our San Francisco area data to cooperate with

Osmnx

Using Python to imitate a data visualization diagram of R language

 

Then, based on the above data, we make statistics on the number of street trees in each community, and divide the data into boxes, with the color value of the preset interval

#Count the number of trees in each community
sf_trees = \
(
    gpd
    #Spatial connection
    .sjoin(left_df=sf,
           right_df=trees,
           op='contains',
           how='left')
    #Count by name (no number of communities connected here are counted)
    #It's wrong to mark 1 in essence, but it doesn't affect our drawing segmentation.)
    .groupby('name')
    .agg({
        'name': 'count',
        'geometry': 'first'
    })
    . rename (columns = {name ':'quantity'})
    .reset_index(drop=False)
    #Convert directly to geodataframe
    .pipe(gpd.GeoDataFrame, crs='EPSG:4326')
)

sf_ Trees ['color '] =(
    pd
    .cut(sf_ Trees ['quantity '], 
         bins=[0, 2500, 5000, 7500, 10000, max(sf_ Trees ['quantity '], 
         labels=['#e4f1e1', '#c0dfd1', '#67a9a2', '#3b8383', '#145e64'])
)

 

Finally, don’t forget our buffer generation as contour:

#Generate contour buffer
sf_bounds = gpd.GeoSeries([sf.buffer(0.001).unary_union], crs='EPSG:4326')

 

“Main visual elements drawing”

After making these preparations, we can directly draw the main elements of the image:

import matplotlib.pyplot as plt
from matplotlib import font_manager as fm

#Set global default font
plt.rcParams['font.sans-serif'] = ['Times New Roman']

fig, ax = plt.subplots(figsize=(6, 6))

#Set background color
ax.set_facecolor('#333333')
fig.set_facecolor('#333333')

#Layer 1: buffer outline
ax = (
    sf_bounds
    .plot(ax=ax, facecolor='none', edgecolor='#cccccc', linewidth=1)
)

#Layer 2: community faces with tree statistics
ax = (
    sf_trees
    .plot(color=sf_ Trees ['color '], edgecolor =' #333333 ',
          linewidth=0.5, ax=ax)
)

#Layer 3: OSM road network
ax = (
    roads
    .plot(linewidth=0.05, edgecolor='#3c3d3d',
          ax=ax)
)

#Set X axis
ax.set_xticks([-122.5, -122.45, -122.4, -122.35])
ax.set_xticklabels(['122.5°W', '122.45°W', '122.4°W', '122.35°W'])

#Set the Y axis
ax.set_yticks([37.72, 37.74, 37.76, 37.78, 37.8, 37.82])
ax.set_yticklabels(['37.72°N', '37.74°N', '37.76°N', '37.78°N', '37.8°N', '37.82°N'])

#Set axis style
ax.tick_params(axis='both', labelcolor='#737373', color='none', labelsize=8)

#Hide spines lines around
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['bottom'].set_color('none')

#Export image
Fig. savefig ('fig. 4. PNG ', DPI = 600, bbox)_ inches='tight')

 

Using Python to imitate a data visualization diagram of R language

 

“Addition of auxiliary visual elements”

Next, we just need to add a variety of small elements of the finishing touch, of which it is worth mentioning that the legend below we use inset_ Axes () can be inserted into subgraph flexibly.

In addition, the use of external font files is also very colorful. Here we use two special fonts in the “title” and “scale label” (you can find all the font files I use in the GitHub warehouse at the beginning)

fig, ax = plt.subplots(figsize=(6, 6))

#Set background color
ax.set_facecolor('#333333')
fig.set_facecolor('#333333')

#Layer 1: buffer outline
ax = (
    sf_bounds
    .plot(ax=ax, facecolor='none', edgecolor='#cccccc', linewidth=1)
)

#Layer 2: community faces with tree statistics
ax = (
    sf_trees
    .plot(color=sf_ Trees ['color '], edgecolor =' #333333 ',
          linewidth=0.5, ax=ax)
)

#Layer 3: OSM road network
ax = (
    roads
    .plot(linewidth=0.05, edgecolor='#3c3d3d',
          ax=ax)
)

#Set X axis
ax.set_xticks([-122.5, -122.45, -122.4, -122.35])
ax.set_xticklabels(['122.5°W', '122.45°W', '122.4°W', '122.35°W'])

#Set the Y axis
ax.set_yticks([37.72, 37.74, 37.76, 37.78, 37.8, 37.82])
ax.set_yticklabels(['37.72°N', '37.74°N', '37.76°N', '37.78°N', '37.8°N', '37.82°N'])

#Set axis style
ax.tick_params(axis='both', labelcolor='#737373', color='none', labelsize=8)

#Hide spines lines around
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['bottom'].set_color('none')

#Add the legend below by inserting a subgraph
ax_bar = ax.inset_axes((0.25, -0.12, 0.5, 0.015))
ax_bar.set_facecolor('#333333')
ax_bar.spines['left'].set_color('none')
ax_bar.spines['right'].set_color('none')
ax_bar.spines['top'].set_color('none')
ax_bar.spines['bottom'].set_color('none')

ax_bar.bar(range(5), [1]*5, width=0.975, color=['#e4f1e1', '#c0dfd1', '#67a9a2', '#3b8383', '#145e64'])
ax_bar.set_yticks([])
ax_bar.set_xticks([i+0.5 for i in range(4)])
ax_bar.set_xticklabels(['2500', '5000', '7500', '10000'], 
                       fontdict={'fontproperties': fm.FontProperties(fname="RobotoCondensed-Regular.ttf")})
ax_bar.tick_params(color='none', labelcolor='#ffffff', labelsize=8, pad=0)

ax.set_title('Street trees of San Francisco', 
             fontsize=24,
             color='#ffffff',
             pad=40,
             fontproperties=fm.FontProperties(fname="Amaranth-Bold.ttf"))

ax.text(0.5, 1.08, '''There are a total of 192987 trees in San Francisco regrouped into 571 species.
The district with the most number of trees is Mission whereas the one with
the least number of trees is LincoLn Park / Ft. Miley.''', transform=ax.transAxes, ma='center',
        ha='center', va='top', color='#ffffff')

ax.text(0.5, -0.22, 'Visualization by CNFeffery', fontsize=8,
        color='#737373', ha='center', transform=ax.transAxes)

#Export image
Fig. savefig ('fig. 5. PNG ', DPI = 600, bbox)_ inches='tight')

 

Using Python to imitate a data visualization diagram of R language