# How to calculate the median of a set of data in MySQL?

Time：2021-9-18

To get the median of a set of data (for example, the median revenue of a region or a company), we generally divide this task into three sub tasks:

1. Sort the data and give each row of data its ranking in all data;
2. Find the median ranking number;
3. Find out the value corresponding to the middle ranking;

The following takes the monthly income of employees of a company as an example to illustrate the use of some complex MySQL statements.

## Method 1

#### Create test table

First, create an income table. The statement for creating the table is:

``````CREATE TABLE IF NOT EXISTS `employee` (
`id`     INT                  AUTO_INCREMENT PRIMARY KEY,
`name`   VARCHAR(10) NOT NULL DEFAULT '',
`income` INT         NOT NULL DEFAULT '0'
)
ENGINE = InnoDB
DEFAULT CHARSET = utf8;

INSERT INTO `employee` (`name`, `income`)
Values ('pockmarked ', 20000);
INSERT INTO `employee` (`name`, `income`)
Values ('li Si ', 12000);
INSERT INTO `employee` (`name`, `income`)
Values ('zhang San ', 10000);
INSERT INTO `employee` (`name`, `income`)
Values ('wang Er ', 16000);
INSERT INTO `employee` (`name`, `income`)
Values ('local tyrant ', 40000);``````

Sort the data and give each row of data its ranking in all data:

``````SELECT t1.name, t1.income, COUNT(*) AS rank
FROM employee AS t1,
employee AS t2
WHERE t1.income < t2.income
OR (t1.income = t2.income AND t1.name <= t2.name)
GROUP BY t1.name, t1.income
ORDER BY rank;``````

The query result is:

name income rank
vulgar tycoon 40000 1
Pockmarks 20000 2
WangTwo 16000 3
Li Si 12000 4
Zhang San 10000 5

Find the median ranking number:

``````SELECT (COUNT(*) + 1) DIV 2 as rank
FROM employee;``````

The query result is:

rank
3

``````SELECT income AS median
FROM (SELECT t1.name, t1.income, COUNT(*) AS rank
FROM employee AS t1,
employee AS t2
WHERE t1.income < t2.income
OR (t1.income = t2.income AND t1.name <= t2.name)
GROUP BY t1.name, t1.income
ORDER BY rank) t3
WHERE rank = (SELECT (COUNT(*) + 1) DIV 2 FROM employee)``````

The query result is:

median
16000

So far, we have found a way to get the median from a set of data.

## Method 2

Next, let’s introduce another method to optimize ranking statements.

We all know how to sort a group of data. In this example, the implementation method is as follows:

``````SELECT name, income
FROM employee
ORDER BY income DESC``````

The query result is:

name income
vulgar tycoon 40000
Pockmarks 20000
WangTwo 16000
Li Si 12000
Zhang San 10000

Can we go further and add a column to the query results, and the data in this column is the ranking?

We can achieve this goal through three custom variables:

The first variable is used to record the revenue of the current row of data
The second variable is used to record the income of the previous row of data
The third variable is used to record the ranking of the current row of data

``````SET @curr_income := 0;
SET @prev_income := 0;
SET @rank := 0;

SELECT `name`,
@curr_income := income                                      AS income,
@rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank,
@prev_income := @curr_income                                AS dummy
FROM employee
ORDER BY income DESC
``````

The query results are as follows:

name income rank dummy
vulgar tycoon 40000 1 40000
Pockmarks 20000 2 20000
WangTwo 16000 3 16000
Li Si 12000 4 12000
Zhang San 10000 5 10000

Then find out the median ranking number and further find out the median income:

``````SET @curr_income := 0;
SET @prev_income := 0;
SET @rank := 0;

SELECT income AS median
FROM (SELECT `name`,
@curr_income := income                                      AS income,
@rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank,
@prev_income := @curr_income                                AS dummy
FROM employee
ORDER BY income DESC) AS t1
WHERE t1.rank = (SELECT (COUNT(*) + 1) DIV 2 FROM employee)``````

The query result is:

median
16000

So far, we have found two ways to solve the median problem. Sprinkle flowers.