Sample code of how MySQL / MariaDB implements PivotTable report

Time:2021-10-17

The previous article introduced several methods to implement PivotTable in Oracle. Today, let’s see how to implement the same functions in MySQL / MariaDB.

The sample data used in this article can be downloaded here.

Use case expressions and group aggregation

The essence of PivotTable report is to group data according to different combinations of rows and columns, and then summarize the results; Therefore, it is very similar to the function of group by plus aggregate functions (count, sum, AVG, etc.) in the database.

First, we use the following group by clause to classify and summarize the sales data:

Select coalesce (product, '[all products]') "product",
    Coalesce (channel, '[all channels]') "channel",
    any_ Value (coalesce (extract (year_month from saledate), '[all months]')) "month",
    Sum (amount) "sales volume"
from sales_data
group by product,channel,extract(year_month from saledate) with rollup;

The above statements are summarized according to products, channels and months; The with rollup option is used to generate subtotals, totals and totals at different levels; The coalesce function is used to display the null value in the summary row as the corresponding information; any_ The value function is used to return any data in the group. If it is removed, it will return a syntax error (a MySQL Bug). The results returned by this query are as follows:

product       | channel       | month        | sales volume     |
———|———|———–|——-|
Orange       | JD.COM       | two hundred and one thousand nine hundred and one     |  41289|
Orange       | JD.COM       | two hundred and one thousand nine hundred and two     |  43913|
Orange       | JD.COM       | two hundred and one thousand nine hundred and three     |  49803|
Orange       | JD.COM       | two hundred and one thousand nine hundred and four     |  49256|
Orange       | JD.COM       | two hundred and one thousand nine hundred and five     |  64889|
Orange       | JD.COM       | two hundred and one thousand nine hundred and six     |  62649|
Orange       | JD.COM       | [all months] | 311799|
Orange       | storefront       | two hundred and one thousand nine hundred and one     |  41306|
Orange       | storefront       | two hundred and one thousand nine hundred and two     |  37906|
Orange       | storefront       | two hundred and one thousand nine hundred and three     |  48866|
Orange       | storefront       | two hundred and one thousand nine hundred and four     |  48673|
Orange       | storefront       | two hundred and one thousand nine hundred and five     |  58998|
Orange       | storefront       | two hundred and one thousand nine hundred and six     |  58931|
Orange       | storefront       | [all months] 𞓜 294680|
Orange       | TaoBao       | two hundred and one thousand nine hundred and one     |  43488|
Orange       | TaoBao       | two hundred and one thousand nine hundred and two     |  37598|
Orange       | TaoBao       | two hundred and one thousand nine hundred and three     |  48621|
Orange       | TaoBao       | two hundred and one thousand nine hundred and four     |  49919|
Orange       | TaoBao       | two hundred and one thousand nine hundred and five     |  58530|
Orange       | TaoBao       | two hundred and one thousand nine hundred and six     |  64626|
Orange       | TaoBao       | [all months] 𞓜 302782|
Orange       | [all channels] | [all months] | 909261|

Banana       | [all channels] | [all months] | 925369|
[all products] | [all channels] | [all months] | 2771682|

In fact, we have obtained the summary results of sales volume, but we need to display the data in different columns according to different months; That is, you need to convert rows into columns. This function can be realized by case expression:

Select coalesce (product, 'all products'), coalesce (channel,' all channels'), channel, 
    Sum (case extract (year_month from saledate) when 201901 then amount else 0 end) "January",
    Sum (case extract (year_month from saledate) when 201902 then amount else 0 end) "February",
    Sum (case extract (year_month from saledate) when 201903 then amount else 0 end) "March",
    Sum (case extract (year_month from saledate) when 201904 then amount else 0 end) "April",
    Sum (case extract (year_month from saledate) when 201905 then amount else 0 end) "may",
    Sum (case extract (year_month from saledate) when 201906 then amount else 0 end) "June",
    Sum (amount) "total"
from sales_data
group by product, channel with rollup;

The case expression in the first sum function only summarizes the sales volume in January 201901, and the sales volume in other months is set to 0; The following sum functions are analogized to obtain the sales summary of each month and the total of all months. The PivotTable returned by this query is as follows:

product        | channel        | January   | February    | March    | April    | May   | June    | total    |
———-|———-|——|——|——|——|——|——|——-|
Orange        | JD.COM       | 41289| 43913| 49803| 49256| 64889| 62649| 311799|
Orange        | storefront       | 41306| 37906| 48866| 48673| 58998| 58931| 294680|
Orange        | TaoBao       | 43488| 37598| 48621| 49919| 58530| 64626| 302782|
Orange        | [all channels] – 126083 | 119417 | 147290 | 147848 | 182417 | 186206 | 909261|
Apple        | JD.COM       | 38269| 40593| 56552| 56662| 64493| 62045| 318614|
Apple        | storefront       | 43845| 40539| 44909| 55646| 56771| 64933| 306643|
Apple        | TaoBao       | 42969| 43289| 48769| 58052| 58872| 59844| 311795|
Apple        | [all channels] – 125083 124421 124230 | 170360 | 180136 | 186822 | 937052|
Banana        | JD.COM       | 36879| 36981| 51748| 54801| 64936| 60688| 306033|
Banana        | storefront       | 41210| 39420| 50884| 52085| 60249| 67597| 311445|
Banana        | TaoBao       | 42468| 41955| 52780| 54971| 56504| 59213| 307891|
Banana        | [all channels] – 120557 | 118356 | 155412 | 161857 | 181689 | 187498 | 925369|
[all products] | [all channels] | 371723 | 362194 | 452932 | 480065 | 544242 | 560526 | 2771682|

The if (expr1, expr2, expr3) function in MySQL can also be used to replace the above case expression.

If there is row to column conversion, there is column to row conversion. MySQL does not have a special function to deal with this situation. You can use the union operator to merge multiple result sets. For example:


with d as (
 select product, channel,
     sum(case extract(year_month from saledate) when 201901 then amount else 0 end) s01,
     sum(case extract(year_month from saledate) when 201902 then amount else 0 end) s02,
     sum(case extract(year_month from saledate) when 201903 then amount else 0 end) s03,
     sum(case extract(year_month from saledate) when 201904 then amount else 0 end) s04,
     sum(case extract(year_month from saledate) when 201905 then amount else 0 end) s05,
     sum(case extract(year_month from saledate) when 201906 then amount else 0 end) s06
 from sales_data
 group by product, channel
)
select product, channel, 201901 saledate, s01 amount from d
union all
select product, channel, 201902 saledate, s02 from d
union all
select product, channel, 201903 saledate, s03 from d
union all
select product, channel, 201904 saledate, s04 from d
union all
select product, channel, 201905 saledate, s05 from d
union all
select product, channel, 201906 saledate, s06 from d;

The general table expression (with clause) constructs sales data containing multiple months, and each month is a column; Then, each query returns the data of one month, and all the results are merged together through the union all operator.

Use precompiled dynamic SQL statements

The method of using case expression and aggregate function to realize PivotTable report has certain limitations. If there is still sales volume from July to December to be counted, we need to modify the query statement to increase the processing of this part. To this end, we can use dynamic SQL to automatically generate row column conversion statements:

select group_concat(
 distinct concat(
  ' sum(case extract(year_month from saledate) when ', dt,
  ' then amount else 0 end) as "', dt, '"')
 ) into @sql
from (
 select extract(year_month from saledate) as dt
 from sales_data
 order by saledate
) d;

set @sql
 =Concat ('select coalesce (product, '[all products]' ') "product", coalesce (channel,' [all channels] '') "channel", '@ SQL,
      , sum (amount) "total"
      from sales_data
      group by product, channel with rollup;');
select @sql;
prepare stmt from @sql;
execute stmt;
deallocate prepare stmt;

First, by querying sales_ Find out all the months in the data table, construct the sum function, and store the constructed statement in the variable @ SQL; group_ The concat function can combine multiple lines of strings into a single string.

group_ The maximum length (bytes) allowed to be returned by the concat function is determined by the system variable group_ concat_ max_ Len. The default value is 1024.

Then, use the set command to merge other parts of the query statement with the existing content. The generated query statement is as follows:

Select coalesce (product, 'all products'), coalesce (channel,' all channels'), channel, 
    sum(case extract(year_month from saledate) when 201901 then amount else 0 end) as "201901", 
    sum(case extract(year_month from saledate) when 201902 then amount else 0 end) as "201902", 
    sum(case extract(year_month from saledate) when 201903 then amount else 0 end) as "201903", 
    sum(case extract(year_month from saledate) when 201904 then amount else 0 end) as "201904", 
    sum(case extract(year_month from saledate) when 201905 then amount else 0 end) as "201905", 
    sum(case extract(year_month from saledate) when 201906 then amount else 0 end) as "201906", 
    Sum (amount) "total"
from sales_data
group by product, channel with rollup;

Finally, execute the statement and return the results through the precompiled command. Even if the sales data of other months is added, the query statement does not need to be modified manually.

Using the connect storage engine

If you use version 10.0 or above of MariaDB, you can use pivot table type in connect storage engine to implement PivotTable report.

First, we need to install the connect storage engine. Windows system can execute the following commands for dynamic installation:


INSTALL SONAME 'ha_connect';

You can also add the following contents to the configuration file my.ini, but you need to restart the service:


[mysqld]
plugin_load_add = ha_connect

For Linux systems, refer to the official documentation for the installation process.

Next, we define a pivot table:


create table pivot_sales(
 product varchar(20) not null,
 channel varchar(20) not null,
 `201901` decimal(10,2) not null flag=1,
 `201902` decimal(10,2) not null flag=1,
 `201903` decimal(10,2) not null flag=1,
 `201904` decimal(10,2) not null flag=1,
 `201905` decimal(10,2) not null flag=1,
 `201906` decimal(10,2) not null flag=1
)
engine=connect table_type=pivot
option_list='PivotCol=saledate,FncCol=amount,host=127.0.0.1,user=root, password=p123456,port=3306'
SrcDef='select product,channel,date_format(saledate, ''%Y%m'') saledate,sum(amount) amount from sales_data group by product,channel,date_format(saledate, ''%Y%m'')';

Where, engine defines the storage engine as connect; table_ Type defines the type of the table as pivot; option_ List is used to define various options. Pivotcol represents the column of data to be converted into multiple fields, fnccol specifies the fields to be summarized, and others are the information connected to the source table server; Srcdef is used to specify the source table query statement, or tabname can be used to specify the table name; The fields above are the structure of the pivot table. Flag = 1 indicates the fields after aggregation.

After the creation is successful, we can query pivot directly_ The data in the sales table:

select * from pivot_sales;

product |channel |201901 |201902 |201903 |201904 |201905 |201906 |
--------|---------|--------|--------|--------|--------|--------|--------|
Orange | Jingdong | 41289.00 | 43913.00 | 49803.00 | 49256.00 | 64889.00 | 62649.00|
Orange store | 41306.00 | 37906.00 | 48866.00 | 48673.00 | 58998.00 | 58931.00|
Orange | Taobao | 43488.00 | 37598.00 | 48621.00 | 49919.00 | 58530.00 | 64626.00|
Apple | JD | 38269.00 | 40593.00 | 56552.00 | 56662.00 | 64493.00 | 62045.00|
Apple store | 43845.00 | 40539.00 | 44909.00 | 55646.00 | 56771.00 | 64933.00|
Apple | Taobao | 42969.00 | 43289.00 | 48769.00 | 58052.00 | 58872.00 | 59844.00|
Banana | Jingdong | 36879.00 | 36981.00 | 51748.00 | 54801.00 | 64936.00 | 60688.00|
Banana store | 41210.00 | 39420.00 | 50884.00 | 52085.00 | 60249.00 | 67597.00|
Banana | Taobao | 42468.00 | 41955.00 | 52780.00 | 54971.00 | 56504.00 | 59213.00|

At present, the pivot table supports limited functions and can only perform some basic operations. For example:

--No error
select * from pivot_sales
Where channel = 'JD';

--Syntax error
select channel from pivot_sales
Where channel = 'JD';

This is the end of this article on how to implement the sample code of MySQL / MariaDB PivotTable report. For more information about MySQL / MariaDB PivotTable report, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!