Automate pivot table queries
You have a sales
table listing product, salesperson and amount:
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
id int(11) default NULL,
product char(5) default NULL,
salesperson char(5) default NULL,
amount decimal(10,2) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO sales VALUES
(1,'radio','bob','100.00'),
(2,'radio','sam','100.00'),
(3,'radio','sam','100.00'),
(4,'tv','bob','200.00'),
(5,'tv','sam','300.00'),
(6,'radio','bob','100.00');
SELECT * FROM sales;
+------+---------+-------------+--------+
| id | product | salesperson | amount |
+------+---------+-------------+--------+
| 1 | radio | bob | 100.00 |
| 2 | radio | sam | 100.00 |
| 3 | radio | sam | 100.00 |
| 4 | tv | bob | 200.00 |
| 5 | tv | sam | 300.00 |
| 6 | radio | bob | 100.00 |
+------+---------+-------------+--------+
If you are asked to tabulate sales amount against salesperson and product, you write a pivot table query:
SELECT
product,
SUM( CASE salesperson WHEN 'bob' THEN amount ELSE 0 END ) AS 'Bob',
SUM( CASE salesperson WHEN 'sam' THEN amount ELSE 0 END ) AS 'Sam',
SUM( amount ) AS Total
FROM sales
GROUP BY product WITH ROLLUP;
+---------+--------+--------+--------+
| product | Bob | Sam | Total |
+---------+--------+--------+--------+
| radio | 200.00 | 200.00 | 400.00 |
| tv | 200.00 | 300.00 | 500.00 |
| NULL | 400.00 | 500.00 | 900.00 |
+---------+--------+--------+--------+
The query generates one product per row and one column per salesperson. The pivoting CASE
expressions assign values of sales.amount
to the matching salesperson's column. For two products and two salespersons, it's a snap once you've done it a few times. When there are dozens of products and salespersons, though, writing the query becomes tiresome and error-prone.
Some years ago Giuseppe Maxia published a little query that automates writing the pivot expressions. His idea was to embed the syntax for lines like the SUM( CASE ...)
lines above in a query for the DISTINCT
values. At the time Giuseppe was writing, MySQL did not support stored procedures. Now that it does, we can further generalise Giuseppe's idea by parameterising it in a stored procedure.
Admittedly, it's a little daunting. To write a query with variable names rather than the usual literal table and column names, we have to write PREPARE
statements. What we propose to do here is to write SQL that writes PREPARE
statements.
Code which writes code which writes code. Not a job for the back of a napkin.
It's easy enough to write the sproc shell. We keep generic queries in a sys
database, so the routine needs parameters specifying database, table, pivot column and (in some cases) the aggregating column. Then what? What worked for us was to proceed from back to front:
- Write the pivot expressions for a specific case.
- Write the
PREPARE
statement that generates those expressions.
- Parameterise the result of #2.
- Put the result of #3 in an sproc.
Further complicating matters, we soon found that different summary aggregations, for example COUNT
and SUM
, require different sprocs. Here is the routine for generating COUNT
pivot expressions:
USE sys;
DROP PROCEDURE IF EXISTS writecountpivot;
DELIMITER |
CREATE PROCEDURE writecountpivot( db CHAR(64), tbl CHAR(64), col CHAR(64) )
BEGIN
DECLARE datadelim CHAR(1) DEFAULT '"';
DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
DECLARE comma CHAR(1) DEFAULT ',';
SET @sqlmode = (SELECT @@sql_mode);
SET @@sql_mode='';
SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
',SUM(IF(', col, ' = ', datadelim, singlequote, comma,
col, comma, singlequote, datadelim, comma, '1,0)) AS `',
singlequote, comma, col, comma, singlequote, '`', singlequote,
') AS countpivotarg FROM ', db, '.', tbl,
' WHERE ', col, ' IS NOT NULL' );
-- UNCOMMENT TO SEE THE MIDLEVEL CODE:
-- SELECT @sql;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL sys.writecountpivot('test','sales','salesperson');
This generates the SQL ...
SELECT DISTINCT
CONCAT(',SUM(IF(salesperson = "',salesperson,'",1,0)) AS `',salesperson,'`')
AS countpivotarg
FROM test.sales
WHERE salesperson IS NOT NULL |
and returns...
+--------------------------------------------+
| countpivotarg |
+--------------------------------------------+
| ,SUM(IF(salesperson = "bob",1,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",1,0)) AS `sam` |
+--------------------------------------------+
which we plug into ...
SELECT
product
,SUM(IF(salesperson = "bob",1,0)) AS `bob`
,SUM(IF(salesperson = "sam",1,0)) AS `sam`
,COUNT(*) AS Total
FROM test.sales
GROUP BY product WITH ROLLUP;
+---------+------+------+-------+
| product | bob | sam | Total |
+---------+------+------+-------+
| radio | 2 | 2 | 4 |
| tv | 1 | 1 | 2 |
| NULL | 3 | 3 | 6 |
+---------+------+------+-------+
Not overwhelming for two columns, very convenient if there are 20. (Yes, it could also be written with COUNT( ... 1, NULL)
).
One point to notice is that the two levels of code generation create quotemark nesting problems. To make the double quotemark '"' available for data value delimiting, we turn off ANSI_QUOTES
during code generation, and put it back afterwards.
SUM
pivot queries need different syntax:
USE sys;
DROP PROCEDURE IF EXISTS writesumpivot;
DELIMITER |
CREATE PROCEDURE writesumpivot( db CHAR(64), tbl CHAR(64), pivotcol CHAR(64), sumcol CHAR(64) )
BEGIN
DECLARE datadelim CHAR(1) DEFAULT '"';
DECLARE comma CHAR(1) DEFAULT ',';
DECLARE singlequote CHAR(1) DEFAULT CHAR(39);
SET @sqlmode = (SELECT @@sql_mode);
SET @@sql_mode='';
SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,
',SUM(IF(', pivotcol, ' = ', datadelim, singlequote, comma,
pivotcol, comma, singlequote, datadelim, comma, sumcol, ',0)) AS `',
singlequote, comma, pivotcol, comma, singlequote, '`', singlequote,
') AS sumpivotarg FROM ', db, '.', tbl,
' WHERE ', pivotcol, ' IS NOT NULL' );
-- UNCOMMENT TO SEE THE MIDLEVEL SQL:
-- SELECT @sql;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;
CALL writesumpivot('test','sales','salesperson','amount');
+-------------------------------------------------+
| sumpivotarg |
+-------------------------------------------------+
| ,SUM(IF(salesperson = "bob",amount,0)) AS `bob` |
| ,SUM(IF(salesperson = "sam",amount,0)) AS `sam` |
+-------------------------------------------------+
which forms the guts of our report query:
SELECT
product
,SUM(IF(salesperson = "bob",amount,0)) AS `bob`
,SUM(IF(salesperson = "sam",amount,0)) AS `sam`
,SUM(amount) AS Total
FROM test.sales
GROUP BY product;
+---------+--------+--------+--------+
| product | bob | sam | Total |
+---------+--------+--------+--------+
| radio | 200.00 | 200.00 | 400.00 |
| tv | 200.00 | 300.00 | 500.00 |
+---------+--------+--------+--------+
There are higher levels of generality beckoning---say, a routine that generates a complete pivot table query, not just the pivot expressions.