Automate pivot table queries

from the Artful Common Queries page


To automate query generation, we need to be clear on the algorithm that the code will implement, and to get to that point we must've mastered the typical base case, so we start with a sales table (product, salesperson, amount) ...
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
  id int(11) default NULL,
  product char(5) default NULL,
  salesperson char(5) default NULL,
  amount decimal(10,2) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO sales VALUES 
  (1,'radio','bob','100.00'),(2,'radio','sam','100.00'),
  (3,'radio','sam','100.00'),(4,'tv','bob','200.00'),
  (5,'tv','sam','300.00'),(6,'radio','bob','100.00');
SELECT * FROM sales;
+------+---------+-------------+--------+
| id   | product | salesperson | amount |
+------+---------+-------------+--------+
|    1 | radio   | bob         | 100.00 |
|    2 | radio   | sam         | 100.00 |
|    3 | radio   | sam         | 100.00 |
|    4 | tv      | bob         | 200.00 |
|    5 | tv      | sam         | 300.00 |
|    6 | radio   | bob         | 100.00 |
+------+---------+-------------+--------+
... and a request for a query to cross-tabulate sales amounts for salesperson and product, a pivot table query ...
SELECT
  product,
  SUM( CASE salesperson WHEN 'bob' THEN amount ELSE 0 END ) AS 'Bob',
  SUM( CASE salesperson WHEN 'sam' THEN amount ELSE 0 END ) AS 'Sam',
  SUM( amount ) AS Total
FROM sales
GROUP BY product WITH ROLLUP;
+---------+--------+--------+--------+
| product | Bob    | Sam    | Total  |
+---------+--------+--------+--------+
| radio   | 200.00 | 200.00 | 400.00 |
| tv      | 200.00 | 300.00 | 500.00 |
| NULL    | 400.00 | 500.00 | 900.00 |
+---------+--------+--------+--------+
The query generates one product per row and one column per salesperson. The pivoting CASE expressions assign values of sales.amount to the matching salesperson's column. For two products and two salespersons, it's a snap once you've done it a few times. When there are dozens of products and salespersons, though, writing the query becomes tiresome and error-prone.

In SQL Server, we could just write ...

  TRANSFORM Sum(amount) AS SumOfAmount
  SELECT product, Sum(amount) AS AmountTotal
  FROM sales
  GROUP BY product,salesperson
  PIVOT salesperson;
... and we'd be done. Few would mind if many readers were to petition MySQL developers at bugs.mysql.com to implement this syntax in MySQL. Meanwhile ...

Some years ago Giuseppe Maxia published a little query that automates writing the pivot expressions. His idea was to embed the syntax for lines like the SUM( CASE ...) lines above in a query for the DISTINCT values. At the time Giuseppe was writing, MySQL did not support stored procedures. Now that it does, we can further generalise Giuseppe's idea by parameterising it in a stored procedure.

Admittedly, it's a bit daunting. Even if we just do it in straight SQL rather than in a stored procedure, we're writing queries which, when run, return as results query expressions that can be combined into a final query we can PREPARE and EXECUTE.

Implementing it in a stored procedure adds another layer: now we're writing variable names rather than the usual literal table and column names.

Code that writes code that writes code. Not a job for the back of an envelope.

The base logic isn't complicated, just two steps:

  1. 1 write a query to generate a list of pivot expressions
  2. 2 embed the result of step 1 in an aggregating query

The complicating devils are in the layers of reference.

In straight SQL

To pivot on salesperson for amount subsums in the sales table above, step 1 generates one pivoting expression for each distinct pivoting value by applying Group_Concat() to the pivoting column in the table:
SELECT 
  GROUP_CONCAT( DISTINCT
                CONCAT( "sum( if(salesperson='",
                        salesperson,
                        "', amount, 0) ) AS `",
                        salesperson, "`"
                      ) 
              )
FROM sales INTO @sql;
That query generates ...
  sum( if(salesperson='bob', amount, 0) ) AS `bob`,
  sum( if(salesperson='sam', amount, 0) ) AS `sam`
Step 2 embeds that result in the query:
SET @sql2 = CONCAT("SELECT product, ", @sql, 
"FROM (
   select product, salesperson, amount
   from sales
)t
group by t.product
order by t.product");
PREPARE stmt FROM @sql2;
EXECUTE stmt;
DROP PREPARE stmt;
+---------+--------+--------+
| product | bob    | sam    |
+---------+--------+--------+
| radio   | 200.00 | 200.00 |
| tv      | 200.00 | 300.00 |
+---------+--------+--------+
If horizontal and vertical totals are desired, there are three more steps before we can assemble the final query ...
  1. 3 add With Rollup to the Group By clause
  2. 4 an outer query to rename the "Totals" row generated by With Rollup
  3. 5 outer query code to generate a horizontal sum for each row
-- add rollup for vertical totals
SET @sql2 = CONCAT("SELECT product, ", @sql, 
"FROM (
   select product, salesperson, amount
   from sales
)t
group by product with rollup
order by ifnull(product,'zzzzz')");

-- generate list of pivot column name values
SET @pivotcols = ( 
  select group_concat( distinct salesperson) 
  from sales
);

-- code for horizontal totals
SET @pivotsumxpr = ( 
  select group_concat( distinct salesperson separator '+') 
  from sales
);

-- assemble
SET @sql3 = CONCAT(
  "select ifnull(product,'TOTALS') as product,", 
  @pivotcols, ",", @pivotsumxpr, 
  " as salespersonTotal from (", @sql2, ")x"
);
PREPARE stmt FROM @sql3;
EXECUTE stmt;
+---------+--------+--------+------------------+
| product | bob    | sam    | salespersonTotal |
+---------+--------+--------+------------------+
| radio   | 200.00 | 200.00 |           400.00 |
| tv      | 200.00 | 300.00 |           500.00 |
| TOTALS  | 400.00 | 500.00 |           900.00 |
+---------+--------+--------+------------------+
DROP PREPARE stmt;
This needs MySQL version 8.0.12 or later---earlier MySQL versions tripped over bugs that raised an error on WITH ROLLUP...ORDER BY, requiring you to move the ORDER BY clause to an additional outer query.

So if your MySQL version is earlier than 8.0.12, use this instead ..

SELECT 
  GROUP_CONCAT( DISTINCT
                CONCAT( "sum( if(salesperson='",
                        salesperson,
                        "', amount, 0) ) AS `",
                        salesperson, "`"
                      ) 
              )
FROM sales INTO @sql;
SET @sql2 = CONCAT("SELECT product, ", @sql, 
"FROM (
   select product, salesperson, amount
   from sales
)t
group by product with rollup");
SET @pivotcols = ( 
  select group_concat( distinct salesperson) 
  from sales
);
SET @pivotsumxpr = ( 
  select group_concat( distinct salesperson separator '+') 
  from sales
);
-- move order by clause to in outer query pre-8.0.12
SET @sql3 = CONCAT(
  "select * from (",
    "select ifnull(product,'TOTALS') as product,", 
    @pivotcols, ",", @pivotsumxpr, 
    " as salespersonTotal from (", @sql2, ")x
  )y order by if(product='TOTALS','zzzzz',product)"
);
PREPARE stmt FROM @sql3;
EXECUTE stmt;
DROP PREPARE stmt;
You'd like to re-use this SQL without the hassle of porting it to a stored procedure? There's always the poor man's method of code generalisation---paste the code into a text file, edit it with a text editor, replace all instances of ...
  product with GROUPCOL
  salesperson with COLPIVOT
  sales with TBLPIVOT
  amount with SUMPIVOT
... in that order, and save it. Then next time you wish to use it, copy the script, replace the ...PIVOT values with your pivot table column and sum column names, and run it. Yes if TBLPIVOT is a complex query with joins &c, it still works.

But really, it's easier to just call a routine, so read on ...

In a Stored Procedure

It's best to have the sproc shell available as a canned text block, also to keep generic queries like this in a system database for availability to all schemas, so a production version needs parameters specifying database, table, pivot column, aggregating column and aggregated column. Here for simplicity we do without the database param.

Then what? Proceed from back to front:

  • 1 Write the pivot expressions for a specific case.
  • 2 Write the PREPARE statement that generates those expressions.
  • 3 Parameterise the result of #2.
  • 4 Put the result of #3 in an sproc.
Further complicating matters, we soon found that different summary aggregations, for example COUNT and SUM, require different sprocs. Here is the routine for generating COUNT pivot expressions:
DROP PROCEDURE IF EXISTS writecountpivot;
DELIMITER |
CREATE PROCEDURE writecountpivot( db CHAR(64), tbl CHAR(64), col CHAR(64) )
BEGIN
  DECLARE datadelim CHAR(1) DEFAULT '"';
  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
  DECLARE comma CHAR(1) DEFAULT ',';
  SET @sqlmode = (SELECT @@sql_mode);
  SET @@sql_mode='';
  SET @sql = 
        CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
                ',SUM(IF(', col, ' = ', datadelim, singlequote, comma,
                col, comma, singlequote, datadelim, comma, '1,0)) AS `', 
                singlequote, comma, col, comma, singlequote, '`', 
                singlequote, ') AS countpivotarg FROM ', db, '.', tbl,
                ' WHERE ', col, ' IS NOT NULL' );
  -- UNCOMMENT TO SEE THE MIDLEVEL CODE:
  -- SELECT @sql; 
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL sys.writecountpivot('test','sales','salesperson');
This generates the SQL ...
SELECT DISTINCT 
  CONCAT(',SUM(IF(salesperson = "',salesperson,'",1,0)
        ) AS `',salesperson,'`') 
  AS countpivotarg 
FROM test.sales 
WHERE salesperson IS NOT NULL |
and returns...
+--------------------------------------------+
| countpivotarg                              |
+--------------------------------------------+
| ,SUM(IF(salesperson = "bob",1,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",1,0)) AS `sam` |
+--------------------------------------------+
The basic aggregating query into which we'll plug the pivot expressions is ...
SELECT 
  product
  ,COUNT(*) AS Total
FROM test.sales
GROUP BY product WITH ROLLUP;
Adding in the pivot sums gives ...
SELECT 
  product
  ,SUM(IF(salesperson = "bob",1,0)) AS `bob` 
  ,SUM(IF(salesperson = "sam",1,0)) AS `sam`
  ,COUNT(*) AS Total
FROM test.sales
GROUP BY product WITH ROLLUP;
+---------+------+------+-------+
| product | bob  | sam  | Total |
+---------+------+------+-------+
| radio   |    2 |    2 |     4 |
| tv      |    1 |    1 |     2 |
| NULL    |    3 |    3 |     6 |
+---------+------+------+-------+
Not overwhelming for two columns, very convenient when there are 20. (Yes, it could also be written with COUNT( ... 1, NULL)).

One point to notice is that the two levels of code generation create quotemark nesting problems. To make the double quotemark '"' available for data value delimiting, we turn off ANSI_QUOTES during code generation, and put it back afterwards.

SUM pivot queries need different syntax ...

USE system;
DROP PROCEDURE IF EXISTS writesumpivot;
DELIMITER |
CREATE PROCEDURE writesumpivot( 
  db CHAR(64), 
  tbl CHAR(64), 
  pivotcol CHAR(64), 
  sumcol CHAR(64) 
)
BEGIN
  DECLARE datadelim CHAR(1) DEFAULT '"';
  DECLARE comma CHAR(1) DEFAULT ',';
  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
  SET @sqlmode = (SELECT @@sql_mode);
  SET @@sql_mode='';
  SET @sql = 
    CONCAT( 'SELECT DISTINCT CONCAT(', singlequote, 
            ',SUM(IF(', pivotcol, ' = ', datadelim, 
            singlequote, comma, pivotcol, comma, 
            singlequote, datadelim, comma, sumcol, ',0)) AS `', 
            singlequote, comma, pivotcol, comma, singlequote, '`', 
            singlequote, ') AS sumpivotarg FROM ', db, '.', tbl, 
            ' WHERE ', pivotcol, ' IS NOT NULL' );
   -- UNCOMMENT TO SEE THE MIDLEVEL SQL:
   -- SELECT @sql;
   PREPARE stmt FROM @sql;
   EXECUTE stmt;
   DROP PREPARE stmt;
   SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL writesumpivot('test','sales','salesperson','amount');
+-------------------------------------------------+
| sumpivotarg                                     |
+-------------------------------------------------+
| ,SUM(IF(salesperson = "bob",amount,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",amount,0)) AS `sam` |
+-------------------------------------------------+
... which forms the guts of our report query ...
SELECT
 product
 ,SUM(IF(salesperson = "bob",amount,0)) AS `bob`
 ,SUM(IF(salesperson = "sam",amount,0)) AS `sam`
 ,SUM(amount) AS Total
FROM test.sales
GROUP BY product;
+---------+--------+--------+--------+
| product | bob    | sam    | Total  |
+---------+--------+--------+--------+
| radio   | 200.00 | 200.00 | 400.00 |
| tv      | 200.00 | 300.00 | 500.00 |
+---------+--------+--------+--------+
A similar solution is at http://mysql.rjweb.org/doc.php/pivot.

Automate the whole query

Is it possible to automate not just the list of pivot expressions, but the whole pivot query? Can we write a routine that will generate and execute a query that cross-tabulates any columns you pass to it? Yes, but the more columns there are to cross-tabulate, the more complicated it gets. Here's a solution for the simplest possible case---counts of one column's distinct values against the distinct values of another column. A test table:

drop table if exists track;
create table track( agent char(1), program char(1) );
insert into track values
  ('a','m'),('b','n'),('c','o'),('a','p');
For pivoted counts, a solution is a 2-step--prepare and run a query to form the pivot list, then prepare and run the whole query using the pivot list result from the first query:
drop procedure if exists crosstabcount;
delimiter go
create procedure crosstabcount( 
  groupbycol varchar(64), 
  sumcol varchar(64), 
  tbl varchar(64) 
) 
begin 
  set @sumlist = null, @sumsql = null; 
  -- PRELIM QUERY TO BUILD SQL EXPRESSIONS FOR PIVOT SUMS
  set @sumsql = concat( 
    "select group_concat( distinct concat( 
      'sum(case when ", sumcol, " = ''', thisval, ''' 
      then 1 else 0 end) as `', thisval, '`') 
    ) into @sumlist from ( 
    select distinct ", groupbycol, ",", sumcol, " as thisval from ", 
    tbl, " order by ", groupbycol, ") as tmp" ); 
  prepare stmt from @sumsql;
  execute stmt;
  drop prepare stmt;
  -- APPLY THE PIVOT EXPRESSION LIST TO THE TARGET TABLE
  set @sql= concat('select ', groupbycol, ',', @sumlist, 
                   ' from ', tbl, ' group by ', groupbycol 
                  ); 
  prepare stmt from @sql; 
  execute stmt; 
  drop prepare stmt;
end;
go
delimiter ;
call crosstabcount( 'agent', 'program', 'track' );
+-------+------+------+------+------+
| agent | m    | p    | n    | o    |
+-------+------+------+------+------+
| a     |    1 |    1 |    0 |    0 |
| b     |    0 |    0 |    1 |    0 |
| c     |    0 |    0 |    0 |    1 |
+-------+------+------+------+------+

Stored procedure for a pivot sum query

Pivot sums need more complex logic. An sproc to cross-tabulate two columns over a numeric column and sum the results illustrates the challenges. The sproc takes params for a target table name, grouping column, pivoting column, a numeric amount to be cross-summed, and a param to toggle debug mode ...
drop procedure if exists writepivotsumquery;
delimiter go
CREATE PROCEDURE writepivotsumquery(
  vpivtbl varchar(2048),
  vgroupcol varchar(64),
  vpivcol varchar(64),
  vpivsumcol varchar(64),
  pdebug boolean
)
BEGIN
  DECLARE datadelim CHAR(1) DEFAULT '"';
  DECLARE comma CHAR(1) DEFAULT ',';
  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);

  -- ERROR HANDLER
  DECLARE Exit Handler for SqlException
    Begin
      Get Diagnostics Condition 1 
        @sqlstate = Returned_SqlState,
        @errno = Mysql_Errno, 
        @text = Message_Text;
      Select 'writepivotsumquery' as proc,@sqlstate, @errno, @text;
    End;  

  -- PIVOT LOGIC
  SET @pivotsql='',@sql2='',@sql3='',@sqltmp='',@pivotsumsql='',@sql3='',
      @pivotgensql = 
    CONCAT( 'SELECT GROUP_CONCAT(DISTINCT CONCAT(', 
            singlequote, 'SUM(IF(', vpivcol, '=', datadelim, 
            singlequote, comma, vpivcol, comma, singlequote, 
            datadelim, comma, vpivsumcol, ',0)) AS `', 
            singlequote, comma, vpivcol, comma, singlequote, '`', 
            singlequote, ')) INTO @pivotsql FROM ', vpivtbl, 
            ' WHERE ', vpivcol, ' IS NOT NULL'
          );
  if pdebug then
    select @pivotgensql;
  end if;
  prepare stmt from @pivotgensql;
  execute stmt;
  drop prepare stmt;
  if pdebug then
    select @pivotsql;
  end if;
  
  -- CORE QUERY
  SET @sql2 = CONCAT(
    "SELECT ", vgroupcol, comma, @pivotsql, " FROM (select ", 
    vgroupcol, comma, vpivcol, comma, vpivsumcol, " from ", 
    vpivtbl, ")t group by ", vgroupcol, " with rollup ",
    "order by ifnull(", vgroupcol, comma, "'zzzzz')"
  );
  if pdebug then
    select @sql2;
  end if;

  -- BACKTICKED PIVOT COLUMN VALUES TO RESULTSET COLUMN NAMES
  set @sqltmp = concat(
    "select group_concat(distinct ", 
    concat( "concat('`',", vpivcol,",'`'))" ), 
    "from ", vpivtbl, " into @pivotcols" 
  );
  prepare stmt from @sqltmp;
  execute stmt;
  drop prepare stmt;
  if pdebug then
    select @pivotcols;
  end if;

  -- BACKTICKED PIVCOL NAMES TO HORIZONTAL VALUE SUMS
  set @sqltmp = concat( 
    "select group_concat(distinct ", 
    concat( "concat('`',", vpivcol,",'`') separator '+')"), 
    " from ", vpivtbl, " into @pivotsumxpr"
  );
  if pdebug then
    select @sqltmp;
  end if;
  prepare stmt from @sqltmp;
  execute stmt;
  drop prepare stmt;
  if pdebug then
    select @pivotsumxpr;
  end if;

  -- ASSEMBLE
  SET @sql3 = CONCAT(
    "select ifnull(", vgroupcol,comma,"'TOTALS') as ",vgroupcol,comma, 
    @pivotcols,comma,@pivotsumxpr," as ",vgroupcol,"Totals from (", @sql2, ")x"
  );
  if pdebug then
    select @sql3;
  end if;
  PREPARE stmt FROM @sql3;
  EXECUTE stmt;
  DROP PREPARE stmt;  
END;
go
delimiter ;
Pivot salesperson on product over amount in the sales table example above...
call writepivotsumquery('sales', 'product', 'salesperson', 'amount', false );
+---------+--------+--------+---------------+
| product | bob    | sam    | productTotals |
+---------+--------+--------+---------------+
| radio   | 200.00 | 200.00 |        400.00 |
| tv      | 200.00 | 300.00 |        500.00 |
| TOTALS  | 400.00 | 500.00 |        900.00 |
+---------+--------+--------+---------------+
And, an example from a recent MySQL forum question: pivot client datepay on client over the totsl remited...
CREATE TABLE `remission` (
  `client` varchar(64) DEFAULT NULL,
  `datepay` date DEFAULT NULL,
  `total` decimal(8,2) DEFAULT NULL
) ENGINE=InnoDB;
INSERT INTO `remission` VALUES 
('a','2020-01-01',1000.00),('a','2020-01-15',2000.00),
('b','2020-01-01',1100.00),('b','2020-01-15',2100.00),
('c','2020-01-01',1200.00),('c','2020-01-15',2200.00),
('d','2020-01-01',1300.00),('d','2020-01-15',2300.00);

call writepivotsumquery('remission', 'datepay', "client", 'total', false );
+------------+---------+---------+---------+---------+---------------+
| datepay    | a       | b       | c       | d       | datepayTotals |
+------------+---------+---------+---------+---------+---------------+
| 2020-01-01 | 1000.00 | 1100.00 | 1200.00 | 1300.00 |       4600.00 |
| 2020-01-15 | 2000.00 | 2100.00 | 2200.00 | 2300.00 |       8600.00 |
| TOTALS     | 3000.00 | 3200.00 | 3400.00 | 3600.00 |      13200.00 |
+------------+---------+---------+---------+---------+---------------+
With many an OLAP query, data for the query has already been assembled into a "fact table", so the pivot query writer mainly needs to specify the table name and the grouping, pivot and aggregate columns.

But what if the data source is a multi-table query rather than a single table? For example, sales records of a modest boatselling business are in orditems(item_id, prod_id, order_id, date_added, ...) and products(prod_id, name, description, price, ...) tables. The owner wants a tabulation of sales by product by year since she launched the business.

Rather than trying to anticipate all possible source data complexities, just create a temporary table or View (in effect, a transient fact table) to hand to the pivot query generator ...

drop temporary table if exists pivotin;
create temporary table pivotin 
  select year(date_added) as Yr,i.prod_id,price as Price
  from orditems i 
  join products p using(prod_id);

-- OR ...

create or replace view pivotin as
  select year(date_added) as Yr,i.prod_id,price as Price
  from orditems i 
  join products p using(prod_id);
Then ...
call writepivotsumquery( "pivotin", "Yr", "prod_id", "Price", false );
+--------+---------+-----------+-----------+---------+-----------+-----------+
| Yr     | 1       | 15        | 16        | 17      | 18        | YrTotals  |
+--------+---------+-----------+-----------+---------+-----------+-----------+
| 2005   |    0.00 |   4485.00 |  33915.00 |    0.00 |      0.00 |  38400.00 |
| 2006   |    0.00 |   2990.00 |  23940.00 |    0.00 |      0.00 |  26930.00 |
| 2007   |    0.00 |  17940.00 |  43890.00 |    0.00 |      0.00 |  61830.00 |
| 2008   |    0.00 |  14950.00 |  35910.00 | 2895.00 |  16170.00 |  69925.00 |
| 2009   |    0.00 |  16445.00 |  31920.00 |    0.00 |  45815.00 |  94180.00 |
| 2010   |    0.00 |  20930.00 |  47880.00 |    0.00 |  56595.00 | 125405.00 |
| 2011   |    0.00 |  31395.00 |  33915.00 |    0.00 |  61985.00 | 127295.00 |
| 2012   |    0.00 |  23920.00 |  47880.00 |    0.00 |  18865.00 |  90665.00 |
| 2013   | 1295.00 |  20930.00 |  49875.00 |    0.00 |  26950.00 |  99050.00 |
| 2014   |    0.00 |  11960.00 |  31920.00 |    0.00 |   5390.00 |  49270.00 |
| 2015   |    0.00 |  14950.00 |  23940.00 |    0.00 |  13475.00 |  52365.00 |
| 2016   |    0.00 |      0.00 |  23940.00 |    0.00 |   5390.00 |  29330.00 |
| 2017   |    0.00 |  10465.00 |  11970.00 |    0.00 |   5390.00 |  27825.00 |
| 2018   |    0.00 |   1495.00 |   9975.00 |    0.00 |      0.00 |  11470.00 |
| 2019   |    0.00 |   1495.00 |   1995.00 |    0.00 |      0.00 |   3490.00 |
| 2020   |    0.00 |   1495.00 |      0.00 |    0.00 |      0.00 |   1495.00 |
| TOTALS | 1295.00 | 195845.00 | 452865.00 | 2895.00 | 256025.00 | 908925.00 |
+--------+---------+-----------+-----------+---------+-----------+-----------+
Persistent fact table or not, once we have the sproc, calling it is a lot easier than writing the query from scratch.

Last updated 18 May 2020




Return to the Artful Common Queries page