This year’s Olympic Games will be postponed to the summer of 2021. Take stock of the data of previous Olympic Games


The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.

The following article is from CDA data analyst, author: CDA data analyst




At the beginning of 2020, the sudden attack of COVID-19 brought great changes to our life. At the same time, many international sporting events were also suspended. On March 24, the Tokyo Olympic Organizing Committee announced that the 2020 Tokyo Summer Olympic Games would be postponed to the summer of 2021.



With the domestic epidemic gradually under effective control, our life has been on the right track, but the global epidemic situation is still very serious. Since August, the second wave of epidemic situation in Japan has been raging, and the number of newly diagnosed cases in a single day has constantly reached a record high. On November 18, the number of newly diagnosed cases in a single day has reached 2201.


This also makes the 2020 Tokyo Olympic Games, which was originally delayed by one year, once again shrouded in a layer of uncertainty. People can’t help thinking, can the Tokyo Olympic Games, which has been postponed for one year, be held smoothly?



So what are the interesting data about the Olympic Games, the most influential sports event in the world?


What are the changes in the number of participating countries and events in the Olympic Games?

What is the cumulative medal ranking of each country?

What are the characteristics of athletes’ age and height?

Today we will take you to make a good disk with data.


The Olympic Games originated in ancient Greece more than 2000 years ago, and got its name because it was held in Olympia. It held the first Olympic Games in 1896 and the first Winter Olympic Games in 1924, which is the most influential sports event in the world.


Data understanding


We selected a historical data set about the modern Olympic Games, including all the Olympic Games from Athens in 1896 to Rio in 2016.


The dataset is taken from the website:


It should be noted that the Winter Olympic Games and the summer Olympic Games have been separated since 1994 and held every two years. The 1992 Winter Olympic Games is the last winter Olympic Games to be held in the same year as the summer Olympic Games. Since 1924, the first session has been held, and up to 2018, a total of 23 sessions have been held, once every four years.


athlete_ The events. CSV file contains 271116 rows and 15 columns. Each row corresponds to the athletes who compete in a separate Olympic event (athlete event). Listed as:


  • ID – athlete’s ID number
  • Name – athlete’s name
  • Sex – Gender
  • Age – age
  • Height – height (CM)
  • Weight – body weight (kg)
  • Team – team name
  • NOC – National Olympic Committee code
  • Games – year and season
  • Year – year
  • Season – Season
  • City – host city
  • Sport – Sports
  • Event – Events
  • Medal – Awards (gold, silver, bronze or none)

read in data

First, import the package and data.

#Import library
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 

import plotly as py 
import plotly.graph_objs as go 
import as px 
import plotly.figure_factory as ff 
from plotly.subplots import make_subplots

pyplot = py.offline.plot

#Read in data
df_athlete = pd.read_csv('./archive/athlete_events.csv')
df_regions = pd.read_csv('./archive/noc_regions.csv') 




RangeIndex: 271116 entries, 0 to 271115
Data columns (total 15 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   ID      271116 non-null  int64  
 1   Name    271116 non-null  object 
 2   Sex     271116 non-null  object 
 3   Age     261642 non-null  float64
 4   Height  210945 non-null  float64
 5   Weight  208241 non-null  float64
 6   Team    271116 non-null  object 
 7   NOC     271116 non-null  object 
 8   Games   271116 non-null  object 
 9   Year    271116 non-null  int64  
 10  Season  271116 non-null  object 
 11  City    271116 non-null  object 
 12  Sport   271116 non-null  object 
 13  Event   271116 non-null  object 
 14  Medal   39783 non-null   object 
dtypes: float64(3), int64(2), object(10)
memory usage: 31.0+ MB





Data preprocessing


Here, the data are processed as follows to facilitate subsequent analysis:


  • Merge two datasets horizontally into one dataset, and the key is the NOC column
  • Sex: data value replacement
  • Medal: null padding
#Consolidated data
df_all = pd.merge(df_athlete, df_regions, how='left', on='NOC')

#Processing the sex column
df_all['Sex'] = df_all['Sex'].map({'M': 'Male', 'F': 'Female'})

#Processing the medal column
df_all['Medal'].fillna('No Medal', inplace=True)





Data visualization


We use the processed data for data visualization analysis, and the results are as follows:


Overall Olympic data


Changes in the number of participating countries



On the whole, the participating countries showed an upward trend, but there were two abnormal declines in the Olympic Games. namely:


1976 Montreal Olympic Games: there was an unprecedented scale of anti racial discrimination action, the Games were boycotted by African countries, the scale is far less than the last.


1980 Moscow Olympic Games: in order to express the condemnation and anger of the Soviet invasion of Afghanistan, the United States took the lead in refusing to participate in the 1980 Moscow Olympic Games, and called on other countries to boycott. At the call of the United States, a total of 65 countries eventually boycotted the Moscow Olympics, accounting for two fifths of the number of participating countries at that time. In the end, only 80 countries participated in the Moscow Olympic Games, and about 5000 people participated in the games. The number of participants was not as high as the number of journalists who participated in the reporting, which was a record.


Changes in the number of events



It can be seen that the events of the Olympic Games show a wave like upward trend. In the 20 years from 1980 to 2000, the growth trend of the events is the largest, especially in the summer Olympic Games. However, after 2000, the increase trend of the events gradually becomes stable.


The cumulative number of medals in each country is top 20



We selected the top 20 countries in the number of medals won in each event, and found that the United States led a lot in gold, silver and bronze medals, followed by Russia and Germany. Due to the absence of many Olympic Games, the cumulative number of medals in China is behind.


Olympic athlete data


Number of participants per session



It can be seen from the figure that the largest number of participants in the summer Olympic Games was the Sydney Olympic Games in 2000, with 13821 participants. The largest number of participants in the Winter Olympic Games was in 2014, with 4891 participants.


The number of people participating in the summer Olympic Games is far more than that in winter, which may be the reason for the lack of events. At the same time, no Olympic Games were held during the first and second world wars.


Changes in the number and proportion of male and female athletes



(changes in the number of men and women)



(change in sex ratio)


Throughout the history of the Olympic Games, although at the beginning of the Olympic Games, there was a great disparity between male and female athletes, and the proportion of male athletes was always higher than that of female athletes. However, we can see that with the development of the Olympic Games, the proportion of female athletes has been increasing. At present, the proportion of men and women participating in the Olympic Games is almost close to 1:1.


Age and number of medals



As can be seen from the figure, the distribution of age shows a right skewed distribution, 80% of which are concentrated between 19 and 33 years old, and about 25 years old is the golden age of athletes.


Throughout the history of the Olympic Games, the youngest athlete is only 10 years old. In 1896, the first modern Olympic Games was held in Greece. Dimitrios loundras, who was only 10 years old and 218 days old, took part in the competition and won the bronze medal,


In the 1928 Amsterdam Summer Olympic Games, a 97 year old American “athlete” participated in the sculpture of “Sports”, but did not get the place. This record should not be broken.


Height and weight distribution of athletes



(height change)



(weight change)


We screened the data after 1960 and found that the height of male competitors ranged from 127cm to 226cm, the height of female competitors ranged from 127cm to 213cm, the weight of male competitors ranged from 37kg to 226kg, and the weight of female competitors ranged from 25kg to 167kg.


Due to the influence of COVID-19, the Tokyo Olympic Games in Japan became the first Olympic Games postponed in the history of modern Olympic movement. According to a number of Japanese media reports, the direct economic loss caused by the postponement of the Tokyo Olympic Games is about $6 billion. All the extra expenses, such as the expenditure of venues and hotels, the cost of manpower and so on, will make the host’s preparation difficult. In a word, we are looking forward to the improvement of the global epidemic situation next year

reference material:

Wikipedia Olympic Games

Recommended Today

Git usage specification

​ 1. Basic operation of GIT -Git init if a project needs to be managed using git, it needs to be initialized -Git status: check the status of the current code (red: in the Development Zone, green: in the temporary storage zone, nothing to commit: there is no change in the Development Zone) -Git checkout […]