Group column statistics in rows

from the Artful Common Queries page


A pivot (or crosstab, or contingency) table aggregates sets of column values into rows of statistics, and pivots target value statistics on partitioning criteria defined by any available data.

Spreadsheet applications have intuitive point-and-click interfaces for generating pivot tables. RDBMSs generally do not. The task looks difficult in SQL, though, only until you have coded a few.

If you ported the Microsoft sample database Northwind to your MySQL database (as described in chapter 11 of Get It Done with MySQL), you can execute this example step by step. Even if you haven't ported Northwind, the example is easy to follow. Amongst the tables in the Northwind database are:

employees(employeeID, lastname, firstname, ...)
orders(orderID, customerId, employeeID, orderdate, ...) 
There are nine employees, and 803 orders dated from 1996 through 1998. Each order points to an employeeID. Suppose we wish to report counts of orders taken by employees pivoted on year--how would we proceed?

With a pivot table 3-step:

1. Write the basic aggregating query, a GROUP BY query to aggregate the data on desired variables (in this case, employee and year).

2. Write the pivoting query as an outer query that creates a column for each pivot value (year) from #1 written as an inner subquery. (Sometimes it is more efficient to write the results of #1 to a temp table and write #2 to refer to the temp table.)

3. Fix a ROLLUP display glitch by encapsulating #2 in a new outer query that labels the ROLLUP row meaningfully.

Here are the three steps in more detail:

1. Group the joined counts by the two criteria, employee and order year, yielding one result row per employee per year:

SELECT
  CONCAT(firstname,' ',lastname) AS 'Employee',
  YEAR(OrderDate) AS col,
  COUNT(*) AS Data
FROM Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY e.employeeID, YEAR(o.OrderDate);
+------------------+------+------+
| Employee         | col  | Data |
+------------------+------+------+
| Nancy Davolio    | 1996 |   26 |
| Nancy Davolio    | 1997 |   55 |
| Nancy Davolio    | 1998 |   42 |
| Andrew Fuller    | 1996 |   16 |
| Andrew Fuller    | 1997 |   41 |
| Andrew Fuller    | 1998 |   39 |
| Janet Leverling  | 1996 |   18 |
| Janet Leverling  | 1997 |   71 |
| Janet Leverling  | 1998 |   38 |
| Margaret Peacock | 1996 |   31 |
| Margaret Peacock | 1997 |   81 |
| Margaret Peacock | 1998 |   44 |
| Steven Buchanan  | 1996 |   11 |
| Steven Buchanan  | 1997 |   18 |
| Steven Buchanan  | 1998 |   13 |
| Michael Suyama   | 1996 |   15 |
| Michael Suyama   | 1997 |   33 |
| Michael Suyama   | 1998 |   19 |
| Robert King      | 1996 |   11 |
| Robert King      | 1997 |   36 |
| Robert King      | 1998 |   25 |
| Laura Callahan   | 1996 |   19 |
| Laura Callahan   | 1997 |   54 |
| Laura Callahan   | 1998 |   31 |
| Anne Dodsworth   | 1996 |    5 |
| Anne Dodsworth   | 1997 |   19 |
| Anne Dodsworth   | 1998 |   19 |
+------------------+------+------+
Nine employees for three years yield 27 aggregated rows.

2. We want one summary row per employee, and one count column for each year when an employee took an order. We pivot the rows of the above resultset on year by querying the above resultset, defining a colunn for every year found, for example:

  SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
grouping the result by row WITH ROLLUP to provide a row of column sums at the bottom. This gives the following query:
SELECT 
  Employee,
  SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
  SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
  SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
  SUM( data ) AS Total    -- sums across years by employee
FROM (
  SELECT                  -- the query from step #1
    CONCAT(firstname,' ',lastname) AS 'Employee', 
    YEAR(OrderDate) AS 'col', 
    COUNT(*) AS Data
  FROM Employees e
  JOIN Orders o ON e.EmployeeID = o.EmployeeID 
  GROUP BY e.employeeID, YEAR(o.OrderDate)
) AS stats
GROUP BY employee WITH ROLLUP;
+------------------+------+------+------+-------+
| Employee         | 1996 | 1997 | 1998 | Total |
+------------------+------+------+------+-------+
| Andrew Fuller    |   16 |   41 |   39 |    96 |
| Anne Dodsworth   |    5 |   19 |   19 |    43 |
| Janet Leverling  |   18 |   71 |   38 |   127 |
| Laura Callahan   |   19 |   54 |   31 |   104 |
| Margaret Peacock |   31 |   81 |   44 |   156 |
| Michael Suyama   |   15 |   33 |   19 |    67 |
| Nancy Davolio    |   26 |   55 |   42 |   123 |
| Robert King      |   11 |   36 |   25 |    72 |
| Steven Buchanan  |   11 |   18 |   13 |    42 |
| NULL             |  152 |  408 |  270 |   830 |
+------------------+------+------+------+-------+
3. The result of #2 is correct except that sums ought not to be reported as NULL! We fix that bit of weirdness by writing query #2 as a derived table, and having the new outer query alias the yearly sums row:
SELECT 
  IFNULL( employee, 'SUMS') AS Employee, `1996`, `1997`, `1998` Total
FROM (
  SELECT 
    Employee,
    SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
    SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
    SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
    SUM( data ) AS Total
  FROM (
    SELECT 
      CONCAT(firstname,' ',lastname) AS 'Employee', 
      YEAR(OrderDate) AS 'col', 
      COUNT(*) AS Data
    FROM Employees e
    JOIN Orders o ON e.EmployeeID = o.EmployeeID 
    GROUP BY e.employeeID, YEAR(o.OrderDate)
  ) AS stats
  GROUP BY employee WITH ROLLUP
) AS stats2; 
+------------------+------+------+-------+
| Employee         | 1996 | 1997 | Total |
+------------------+------+------+-------+
| Andrew Fuller    |   16 |   41 |    39 |
| Anne Dodsworth   |    5 |   19 |    19 |
| Janet Leverling  |   18 |   71 |    38 |
| Laura Callahan   |   19 |   54 |    31 |
| Margaret Peacock |   31 |   81 |    44 |
| Michael Suyama   |   15 |   33 |    19 |
| Nancy Davolio    |   26 |   55 |    42 |
| Robert King      |   11 |   36 |    25 |
| Steven Buchanan  |   11 |   18 |    13 |
| SUMS             |  152 |  408 |   270 |
+------------------+------+------+-------+
(Thanks to Chris Gates for the correction.) With multiple statistics and pivot layers, a pivot table query can get complex, but following this 3-step will keep things clear.

Return to the Artful Common Queries page