Within-group aggregates

You have a products table with columns item, supplier, price. Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item.

DROP TABLE IF EXISTS products;
CREATE TABLE products(item int,supplier int,price decimal(6,2));
INSERT INTO products VALUES(1,1,10),(1,2,15),(2,2,20),(2,1,21),(2,2,18);
SELECT * FROM products;
+------+----------+-------+
| item | supplier | price |
+------+----------+-------+
|    1 |        1 | 10.00 |
|    1 |        2 | 15.00 |
|    2 |        2 | 20.00 |
|    2 |        1 | 21.00 |
|    2 |        2 | 18.00 |
+------+----------+-------+

Your first thought may be to GROUP BY item, but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.

This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous. We think better names for it are subaggregates, inner aggregates, or within-group aggregates.

It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...

SELECT item,supplier,MIN(price)
FROM products
GROUP BY item;

Will this reliably return the correct supplier per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary. To retrieve the correct supplier for each item, you need more logic.

One way to model the within-aggregates problem is via an left self exclusion join...

SELECT p1.item,p1.supplier,p1.price
FROM products AS p1
LEFT JOIN products AS p2 ON p1.item  = p2.item AND p1.price > p2.price
WHERE p2.item IS NULL;

...because in the resultset built by joining on left item = right item and left price > right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.

Another solution is to derive an intermediate table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:

CREATE TEMPORARY TABLE tmp (
  item INT,
  minprice DECIMAL DEFAULT 0.0
);
LOCK TABLES products READ;
INSERT INTO tmp 
  SELECT item, MIN(price) 
  FROM products 
  GROUP BY item;

to which you then join the products table to find the matching suppliers:

SELECT products.item, supplier, products.price 
FROM products 
JOIN tmp ON products.item = tmp.item
WHERE products.price=tmp.minprice;
UNLOCK TABLES;
DROP TABLE tmp;

From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:

SELECT item, supplier, price
FROM products AS p1
WHERE price = (
  SELECT MIN(p2.price)
  FROM products AS p2
  WHERE p1.item = p2.item
);

It is usually faster to move the aggregating subquery from the WHERE clause to the FROM clause:

SELECT p.item, p.supplier, p.price
FROM products AS p
JOIN (
  SELECT item, MIN(price) AS minprice
  FROM products
  GROUP BY item
) AS pm ON p.item = pm.item AND p.price = pm.minprice;

Some users have trouble mapping elements of this model to their instance of the problem. There are five elements (or sets of them):

(i) a table, which might be a view, a single physical table, or a table derived from joins
(ii) one or more grouping columns,
(iii) one or more columns to aggregate,
(iv) one or more columns not mentioned in the GROUP BY clause,
(v) an aggregating job to do, typically MIN() or MAX().

In the product/minimum price solution above:

(i) table tbl = product
(ii) grouping column grouping_col = item
(iii) column to aggregate = col_to_aggregate = price
(iv) non-aggregated columns other_detail, ...etc... = supplier
(v) aggregating function = MIN().

A "tricky" solution is to use ORDER BY in a subquery to find lowest prices, and write an outer GROUP BY query to pick each item's lowest price from the subquery:

SELECT *
FROM (
  SELECT *
  FROM products
  ORDER BY price ASC
) AS s
GROUP BY item;

Other methods we've cited for this problem scale poorly because they're O(N2). If ordering and grouping columns are indexed, this solution is O(N * log N), so it may be your best bet for speed. It's an instance of a useful rule of thumb: simplest is often fastest.

Finally, here is a radically different model of the problem. It can find both within-group minima and within-group maxima in a single query. This model aggregates the concatenated within-group grouped column value and the within-group grouping column name in a single string, then uses SUBSTR() to break them apart in the result:

SELECT 
  item,
  SUBSTR( MIN( CONCAT( LPAD(price,6,0),supplier) ), 7)   AS MinSupplier,
    LEFT( MIN( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MinPrice,
  SUBSTR( MAX( CONCAT( LPAD(price,6,0),supplier) ), 7)   AS MaxSupplier,
    LEFT( MAX( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MaxPrice
FROM  products
GROUP BY item;
+------+-------------+----------+-------------+----------+
| item | MinSupplier | MinPrice | MaxSupplier | MaxPrice |
+------+-------------+----------+-------------+----------+
|    1 | 1           |       10 | 2           |       15 |
|    2 | 2           |       18 | 1           |       21 |
+------+-------------+----------+-------------+----------+

Try all solutions to find which is fastest for your version of the problem.

To find the top or bottom N per group, you might think the LIMIT clause would work, but LIMIT is limited in subqueries. See Within-group quotas.