Automate pivot table queries

from the Artful Common Queries page


You have a sales table listing product, salesperson and amount:
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
  id int(11) default NULL,
  product char(5) default NULL,
  salesperson char(5) default NULL,
  amount decimal(10,2) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO sales VALUES 
  (1,'radio','bob','100.00'),
  (2,'radio','sam','100.00'),
  (3,'radio','sam','100.00'),
  (4,'tv','bob','200.00'),
  (5,'tv','sam','300.00'),
  (6,'radio','bob','100.00');
SELECT * FROM sales;
+------+---------+-------------+--------+
| id   | product | salesperson | amount |
+------+---------+-------------+--------+
|    1 | radio   | bob         | 100.00 |
|    2 | radio   | sam         | 100.00 |
|    3 | radio   | sam         | 100.00 |
|    4 | tv      | bob         | 200.00 |
|    5 | tv      | sam         | 300.00 |
|    6 | radio   | bob         | 100.00 |
+------+---------+-------------+--------+
If you are asked to tabulate sales amount against salesperson and product, you write a pivot table query:
SELECT
  product,
  SUM( CASE salesperson WHEN 'bob' THEN amount ELSE 0 END ) AS 'Bob',
  SUM( CASE salesperson WHEN 'sam' THEN amount ELSE 0 END ) AS 'Sam',
  SUM( amount ) AS Total
FROM sales
GROUP BY product WITH ROLLUP;
+---------+--------+--------+--------+
| product | Bob    | Sam    | Total  |
+---------+--------+--------+--------+
| radio   | 200.00 | 200.00 | 400.00 |
| tv      | 200.00 | 300.00 | 500.00 |
| NULL    | 400.00 | 500.00 | 900.00 |
+---------+--------+--------+--------+
The query generates one product per row and one column per salesperson. The pivoting CASE expressions assign values of sales.amount to the matching salesperson's column. For two products and two salespersons, it's a snap once you've done it a few times. When there are dozens of products and salespersons, though, writing the query becomes tiresome and error-prone.

Some years ago Giuseppe Maxia published a little query that automates writing the pivot expressions. His idea was to embed the syntax for lines like the SUM( CASE ...) lines above in a query for the DISTINCT values. At the time Giuseppe was writing, MySQL did not support stored procedures. Now that it does, we can further generalise Giuseppe's idea by parameterising it in a stored procedure.

Admittedly, it's a little daunting. To write a query with variable names rather than the usual literal table and column names, we have to write PREPARE statements. What we propose to do here is to write SQL that writes PREPARE statements.

Code which writes code which writes code. Not a job for the back of a napkin.

It's easy enough to write the sproc shell. We keep generic queries like this in a system database, so the routine needs parameters specifying database, table, pivot column and (in some cases) the aggregating column. Then what? What worked for us was to proceed from back to front:

  • 1 Write the pivot expressions for a specific case.
  • 2 Write the PREPARE statement that generates those expressions.
  • 3 Parameterise the result of #2.
  • 4 Put the result of #3 in an sproc.
Further complicating matters, we soon found that different summary aggregations, for example COUNT and SUM, require different sprocs. Here is the routine for generating COUNT pivot expressions:
DROP PROCEDURE IF EXISTS writecountpivot;
DELIMITER |
CREATE PROCEDURE writecountpivot( db CHAR(64), tbl CHAR(64), col CHAR(64) )
BEGIN
  DECLARE datadelim CHAR(1) DEFAULT '"';
  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
  DECLARE comma CHAR(1) DEFAULT ',';
  SET @sqlmode = (SELECT @@sql_mode);
  SET @@sql_mode='';
  SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
                     ',SUM(IF(', col, ' = ', datadelim, singlequote, comma,
                     col, comma, singlequote, datadelim, comma, '1,0)) AS `', 
                     singlequote, comma, col, comma, singlequote, '`', singlequote, 
                     ') AS countpivotarg FROM ', db, '.', tbl,
                     ' WHERE ', col, ' IS NOT NULL' );
  -- UNCOMMENT TO SEE THE MIDLEVEL CODE:
  -- SELECT @sql; 
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL sys.writecountpivot('test','sales','salesperson');
This generates the SQL ...
SELECT DISTINCT 
  CONCAT(',SUM(IF(salesperson = "',salesperson,'",1,0)) AS `',salesperson,'`') 
  AS countpivotarg 
FROM test.sales 
WHERE salesperson IS NOT NULL |
and returns...
+--------------------------------------------+
| countpivotarg                              |
+--------------------------------------------+
| ,SUM(IF(salesperson = "bob",1,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",1,0)) AS `sam` |
+--------------------------------------------+
The basic aggregating query into which we'll plug the pivot expressions is ...
SELECT 
  product
  ,COUNT(*) AS Total
FROM test.sales
GROUP BY product WITH ROLLUP;
Adding in the pivot sums gives ...
SELECT 
  product
  ,SUM(IF(salesperson = "bob",1,0)) AS `bob` 
  ,SUM(IF(salesperson = "sam",1,0)) AS `sam`
  ,COUNT(*) AS Total
FROM test.sales
GROUP BY product WITH ROLLUP;
+---------+------+------+-------+
| product | bob  | sam  | Total |
+---------+------+------+-------+
| radio   |    2 |    2 |     4 |
| tv      |    1 |    1 |     2 |
| NULL    |    3 |    3 |     6 |
+---------+------+------+-------+
Not overwhelming for two columns, very convenient when there are 20. (Yes, it could also be written with COUNT( ... 1, NULL)).

One point to notice is that the two levels of code generation create quotemark nesting problems. To make the double quotemark '"' available for data value delimiting, we turn off ANSI_QUOTES during code generation, and put it back afterwards. SUM pivot queries need different syntax:

USE system;
DROP PROCEDURE IF EXISTS writesumpivot;
DELIMITER |
CREATE PROCEDURE writesumpivot( db CHAR(64), tbl CHAR(64), pivotcol CHAR(64), sumcol CHAR(64) )
BEGIN
  DECLARE datadelim CHAR(1) DEFAULT '"';
  DECLARE comma CHAR(1) DEFAULT ',';
  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
  SET @sqlmode = (SELECT @@sql_mode);
  SET @@sql_mode='';
  SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote, 
                     ',SUM(IF(', pivotcol, ' = ', datadelim, singlequote, comma,
                     pivotcol, comma, singlequote, datadelim, comma, sumcol, ',0)) AS `', 
                     singlequote, comma, pivotcol, comma, singlequote, '`', singlequote,  
                     ') AS sumpivotarg FROM ', db, '.', tbl, 
                     ' WHERE ', pivotcol, ' IS NOT NULL' );
   -- UNCOMMENT TO SEE THE MIDLEVEL SQL:
   -- SELECT @sql;
   PREPARE stmt FROM @sql;
   EXECUTE stmt;
   DROP PREPARE stmt;
   SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL writesumpivot('test','sales','salesperson','amount');
+-------------------------------------------------+
| sumpivotarg                                     |
+-------------------------------------------------+
| ,SUM(IF(salesperson = "bob",amount,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",amount,0)) AS `sam` |
+-------------------------------------------------+
which forms the guts of our report query:
SELECT
 product
 ,SUM(IF(salesperson = "bob",amount,0)) AS `bob`
 ,SUM(IF(salesperson = "sam",amount,0)) AS `sam`
 ,SUM(amount) AS Total
FROM test.sales
GROUP BY product;
+---------+--------+--------+--------+
| product | bob    | sam    | Total  |
+---------+--------+--------+--------+
| radio   | 200.00 | 200.00 | 400.00 |
| tv      | 200.00 | 300.00 | 500.00 |
+---------+--------+--------+--------+
A similar solution is at http://mysql.rjweb.org/doc.php/pivot.

Is it possible to automate not just the list of pivot expressions, but the whole pivot query? Can we write a routine that will generate and execute a query that cross-tabulates any columns you pass to it? Yes, but the more columns there are to cross-tabulate, the more complicated it gets. Here's a solution for the simplest possible case---counts of one column's distinct values against the distinct values of another column. A test table:

drop table if exists track;
create table track( agent char(1), program char(1) );
insert into track values('a','m'),('b','n'),('c','o'),('a','p');
The solution is a 2-step--prepare and run a query to form the pivot list, then prepare and run the whole query using the pivot list result from the first query:
drop procedure if exists crosstabcount;
delimiter go
create procedure crosstabcount( 
  groupbycol varchar(64), sumcol varchar(64), tbl varchar(64) 
) 
begin 
  set @sumlist = null, @sumsql = null; 
  -- PRELIM QUERY TO BUILD SQL EXPRESSIONS FOR PIVOT SUMS
  set @sumsql = concat( 
    "select group_concat( distinct concat( 
      'sum(case when ", sumcol, " = ''', thisval, ''' 
      then 1 else 0 end) as `', thisval, '`') 
    ) into @sumlist from ( 
    select distinct ", groupbycol, ",", sumcol, " as thisval from ", 
    tbl, " order by ", groupbycol, ") as tmp" ); 
  prepare stmt from @sumsql;
  execute stmt;
  drop prepare stmt;
  -- APPLY THE PIVOT EXPRESSION LIST TO THE TARGET TABLE
  set @sql= concat('select ', groupbycol, ',', @sumlist, 
                   ' from ', tbl, ' group by ', groupbycol 
                  ); 
  prepare stmt from @sql; 
  execute stmt; 
  drop prepare stmt;
end;
go
delimiter ;
call crosstabcount( 'agent', 'program', 'track' );
+-------+------+------+------+------+
| agent | m    | p    | n    | o    |
+-------+------+------+------+------+
| a     |    1 |    1 |    0 |    0 |
| b     |    0 |    0 |    1 |    0 |
| c     |    0 |    0 |    0 |    1 |
+-------+------+------+------+------+

Return to the Artful Common Queries page