Basic aggregation

from the Artful Common Queries page


Aggregates are more popular here than any other topic, in the last two years by about 4 to 1 over the next most popular category.

Basic aggregation is the simplest grouping query pattern: for column foo, display the count, the smallest, the largest, the sum, the average or some other statistic of column bar values. For example, the minimum bar value per foo:

SELECT foo, MIN(bar) AS bar
FROM tbl
GROUP BY foo;

Return the highest bar value for each foo, ordering top to bottom by that value:

SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
ORDER BY Count DESC;

Ditto for AVG(), COUNT() etc. The pattern easily extends to multiple grouping column expressions.

Suppose you want to see which of four bands of counts each foo count falls into? Since MySQL 8 and MariaDB 10 implemented Window functions, you can just add an NTILE call to a standard aggregating query invoking COUNT(*):

SELECT 
  foo, COUNT(bar) AS Count, 
  NTILE(4) OVER( ORDER BY COUNT(*) DESC ) AS band
FROM tbl
GROUP BY foo;

Here NTILE() specifies how many rowsets (or partitions, or windows) to break the result into, and OVER() specifies which result column(s) should be thus partitioned.

For more about Window functions see "Windowing: Introduction".

For aggregating functions like MIN() and MAX() that return a single value, there may be multiple instances of the result. If you wish to see them, put the aggregating query in a subquery and join to it from a direct query on the table:

SELECT a.foo, a.bar
FROM tbl a
JOIN (
  SELECT foo, MAX(bar) AS Count
  FROM tbl
  GROUP BY foo
) b ON a.foo=b.foo AND a.bar=b.count
ORDER BY a.foo, a.bar;

MySQL introduced the SQL extension GROUP_CONCAT(), which makes short work of listing items in groups. For example, given a table of suppliers and the parts they make ...

CREATE TABLE supparts(supID char(2),partID char(2));
INSERT INTO supparts VALUES
('s1','p1'),('s1','p2'),('s1','p3'),('s1','p4'),('s1','p5'),('s1','p6'),
('s2','p1'),('s2','p2'),('s3','p2'),('s4','p2'),('s4','p4'),('s4','p5');

List suppliers for each part:

SELECT partID,GROUP_CONCAT(supID ORDER BY supID) AS Suppliers 
FROM supparts 
GROUP BY partID;
+--------+-------------+
| partID | Suppliers   |
+--------+-------------+
| p1     | s1,s2       |
| p2     | s1,s2,s3,s4 |
| p3     | s1          |
| p4     | s1,s4       |
| p5     | s1,s4       |
| p6     | s1          |
+--------+-------------+

When there are several tables to be joined, the beginner may feel overwhelmed by the complexity of the problem. Suppose you're asked to retrieve the top computer desk salesperson for this schema:

    SALESPERSONS
         |
        make
         |
       ORDERS ---which have---> ORDERLINES 
                                    |
                                    of
                                    |
                                 PRODUCTS

drop table if exists salespersons, orders, orderlines, products;
create table salespersons(salespersonid int,name char(8));
insert into salespersons values(1,'Sam'),(2,'Xavier');
create table orders(orderid int,salespersonid int);
insert into orders values(1,1),(2,1),(3,1),(4,2),(5,2);
create table orderlines(lineid int,orderid int,productid int,qty int);
insert into orderlines values(1,1,1,1),(2,1,1,2),(3,2,2,1),(4,3,1,1),(5,4,1,1),(6,5,2,2);
create table products(productid int,name char(32));
insert into products values(1,'computer desk'),(2,'lamp'),(3,'desk chair');

Two rules of thumb help with problems like this: solve one step at a time, and work from inside out. Here "inside out" means start by building the join list needed to retrieve sales data:

from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)  

Test those joins with a query that just lists sales data:

select s.name, p.name, l.qty
from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)
+--------+---------------+------+
| name   | name          | qty  |
+--------+---------------+------+
| Sam    | computer desk |    1 |
| Sam    | computer desk |    2 |
| Sam    | lamp          |    1 |
| Sam    | computer desk |    1 |
| Xavier | computer desk |    1 |
| Xavier | lamp          |    2 |
+--------+---------------+------+

Now we just need to filter for 'computer desk' sales, add aggregation, and pick off the top seller:

select s.name, sum(l.qty) as n      -- sum quantities
from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)
where p.name='computer desk'    
group by s.name                     -- aggregate by salesperson
order by n desc limit 1;            -- order by descending sum, pick off top value
+------+------+
| name | n    |
+------+------+
| Sam  |    4 |
+------+------+

If columns other than the GROUP BY columns must be retrieved, and if the grouping expression does not have a strictly 1:1 relationship with (ie., are not "functionally dependent" on) those columns, then to avoid returning arbitrary values for those non-grouping columns, you need to either add those columns to the Group By clause, or put the GROUP BY query in a subquery and join that result to the other columns, for example:

SELECT s.partID, s, thiscol, s.thatcol, anothercol, x.Suppliers
FROM supparts s
JOIN (
  SELECT partID,GROUP_CONCAT(supID ORDER BY supID) AS Suppliers 
  FROM supparts 
  GROUP BY partID
) x USING(partID)

If sql_mode includes ONLY_FULL_GROUP_BY, MySQL returns an error for a query that Selects non-aggregated columns that aren't functionally dependent on the grouping columns.

Before version 5.7.5 MySQL, with ONLY_FULL_GROUP_BY set, blindly rejected queries like this with the message that those columns are missing from the GROUP BY expression. Starting in 5.7.5 though, MySQL is better at detecting functional dependencies, and it won't generate such an error if it concludes that the ungrouped columns are functionally dependent on grouoed columns. If it decides they aren't functionally dependent, the message says that. For full discussion of this, see Roland Bouman's excellent discussion at http://rpbouman.blogspot.nl/2014/09/mysql-575-group-by-respects-functional.html

Return to the Artful Common Queries page