How to calculate the median of a set of data in MySQL?

Time:2021-9-18

To get the median of a set of data (for example, the median revenue of a region or a company), we generally divide this task into three sub tasks:

  1. Sort the data and give each row of data its ranking in all data;
  2. Find the median ranking number;
  3. Find out the value corresponding to the middle ranking;

The following takes the monthly income of employees of a company as an example to illustrate the use of some complex MySQL statements.

Method 1

Create test table

First, create an income table. The statement for creating the table is:

CREATE TABLE IF NOT EXISTS `employee` (
  `id`     INT                  AUTO_INCREMENT PRIMARY KEY,
  `name`   VARCHAR(10) NOT NULL DEFAULT '',
  `income` INT         NOT NULL DEFAULT '0'
)
  ENGINE = InnoDB
  DEFAULT CHARSET = utf8;

INSERT INTO `employee` (`name`, `income`)
Values ('pockmarked ', 20000);
INSERT INTO `employee` (`name`, `income`)
Values ('li Si ', 12000);
INSERT INTO `employee` (`name`, `income`)
Values ('zhang San ', 10000);
INSERT INTO `employee` (`name`, `income`)
Values ('wang Er ', 16000);
INSERT INTO `employee` (`name`, `income`)
Values ('local tyrant ', 40000);

Complete task 1

Sort the data and give each row of data its ranking in all data:

SELECT t1.name, t1.income, COUNT(*) AS rank
FROM employee AS t1,
     employee AS t2
WHERE t1.income < t2.income
   OR (t1.income = t2.income AND t1.name <= t2.name)
GROUP BY t1.name, t1.income
ORDER BY rank;

The query result is:

name income rank
vulgar tycoon 40000 1
Pockmarks 20000 2
WangTwo 16000 3
Li Si 12000 4
Zhang San 10000 5

Complete small task 2

Find the median ranking number:

SELECT (COUNT(*) + 1) DIV 2 as rank
FROM employee;

The query result is:

rank
3

Complete small task 3

SELECT income AS median
FROM (SELECT t1.name, t1.income, COUNT(*) AS rank
      FROM employee AS t1,
           employee AS t2
      WHERE t1.income < t2.income
         OR (t1.income = t2.income AND t1.name <= t2.name)
      GROUP BY t1.name, t1.income
      ORDER BY rank) t3
WHERE rank = (SELECT (COUNT(*) + 1) DIV 2 FROM employee)

The query result is:

median
16000

So far, we have found a way to get the median from a set of data.

Method 2

Next, let’s introduce another method to optimize ranking statements.

We all know how to sort a group of data. In this example, the implementation method is as follows:

SELECT name, income
FROM employee
ORDER BY income DESC

The query result is:

name income
vulgar tycoon 40000
Pockmarks 20000
WangTwo 16000
Li Si 12000
Zhang San 10000

Can we go further and add a column to the query results, and the data in this column is the ranking?

We can achieve this goal through three custom variables:

The first variable is used to record the revenue of the current row of data
The second variable is used to record the income of the previous row of data
The third variable is used to record the ranking of the current row of data

SET @curr_income := 0;
SET @prev_income := 0;
SET @rank := 0;

SELECT `name`,
       @curr_income := income                                      AS income,
       @rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank,
       @prev_income := @curr_income                                AS dummy
FROM employee
ORDER BY income DESC

The query results are as follows:

name income rank dummy
vulgar tycoon 40000 1 40000
Pockmarks 20000 2 20000
WangTwo 16000 3 16000
Li Si 12000 4 12000
Zhang San 10000 5 10000

Then find out the median ranking number and further find out the median income:

SET @curr_income := 0;
SET @prev_income := 0;
SET @rank := 0;

SELECT income AS median
FROM (SELECT `name`,
             @curr_income := income                                      AS income,
             @rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank,
             @prev_income := @curr_income                                AS dummy
      FROM employee
      ORDER BY income DESC) AS t1
WHERE t1.rank = (SELECT (COUNT(*) + 1) DIV 2 FROM employee)

The query result is:

median
16000

So far, we have found two ways to solve the median problem. Sprinkle flowers.

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Recommended Today

Seven Python code review tools recommended

althoughPythonLanguage is one of the most flexible development languages at present, but developers often abuse its flexibility and even violate relevant standards. So PythoncodeThe following common quality problems often occur: Some unused modules have been imported Function is missing arguments in various calls The appropriate format indentation is missing Missing appropriate spaces before and after […]