You have a products table with columns item, supplier, price . Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item.
DROP TABLE IF EXISTS products; CREATE TABLE products( item int,supplier int,price decimal(6,2) ); INSERT INTO products VALUES (1,1,10),(1,2,15),(2,2,20), (2,1,21),(2,2,18); SELECT * FROM products; +------+----------+-------+ | item | supplier | price | +------+----------+-------+ | 1 | 1 | 10.00 | | 1 | 2 | 15.00 | | 2 | 2 | 20.00 | | 2 | 1 | 21.00 | | 2 | 2 | 18.00 | +------+----------+-------+Your first thought may be to GROUP BY item , but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.
This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous. We think better names for it are subaggregates, inner aggregates, or within-group aggregates. It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in a SELECT item, supplier, Min(price) FROM products GROUP BY item;Will this reliably return the correct supplier per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary. To retrieve the correct supplier for each item, you need more query logic. Here are six solutions. As you'll see, some are very slow, a couple are quite fast, and some work for only some versions of the problem: 1. Self-exclusion join solution: One way to model within-group minima or maxima is via an left self exclusion join... SELECT p1.item, p1.supplier, p1.price FROM products AS p1 LEFT JOIN products AS p2 ON p1.item=p2.item AND p1.price>p2.price WHERE p2.item IS NULL;...because in the resultset built by joining on left item = right item and left price > right price, the left-sided rows for which no smaller right-sided price exists are precisely the per-item rows with the smallest prices. The query runs more than an order of magnitude faster with an index on (item, supplier) .
2. Intermediate table/CTE solution: Another solution is to derive an intermediate table of aggregated minimum prices, then query that table. Before MySQL 8 and MariaDB 10.2, this requires explicit code for the intermediary table: CREATE TEMPORARY TABLE tmp ( item INT, minprice DECIMAL DEFAULT 0.0 ); LOCK TABLES products READ; INSERT INTO tmp SELECT item, MIN(price) FROM products GROUP BY item;to which you then join the products table to find the matching suppliers: SELECT products.item, supplier, products.price FROM products JOIN tmp ON products.item = tmp.item WHERE products.price=tmp.minprice; UNLOCK TABLES; DROP TABLE tmp;Since MySQL 8 and MariaDB 10.2, this can be written more concisely with a Common Table Expression (CTE): WITH tmp AS ( SELECT item, MIN(price) as minprice FROM products GROUP BY item ) SELECT products.item, supplier, products.price FROM products JOIN tmp ON products.item = tmp.item WHERE products.price=tmp.minprice; 3. Correlated subquery solution: From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. It's also the slowest—up to a hundred times slower than the exclusion join, whether the queries are compared with or without indexes: SELECT item, supplier, price FROM products AS p1 WHERE price = ( SELECT MIN(p2.price) FROM products AS p2 WHERE p1.item = p2.item ); 4. SELECT p.item, p.supplier, p.price FROM products AS p JOIN ( SELECT item, MIN(price) AS minprice FROM products GROUP BY item ) AS pm ON p.item=pm.item AND p.price=pm.minprice;Some users have trouble mapping elements of this model to their instance of the problem. The model has five elements (or sets of them): (i) a table, which might be a view, a single physical table, or a table derived from joins (ii) one or more grouping columns, (iii) one or more columns to aggregate, (iv) one or more columns not mentioned in the (v) an aggregating job to do, typically MIN() or MAX(). In the product/minimum price solution above:
Here's an interesting variation on the problem. A simple table tracks company branches and the kinds of stock they carry: DROP TABLE IF EXISTS branch_stock; CREATE TABLE branch_stock( stock_id INT PRIMARY KEY AUTO_INCREMENT KEY, branch_id INT NOT NULL, stock_item VARCHAR(12) NOT NULL ); INSERT INTO branch_stock (branch_id,stock_item) VALUES (1,'trainers'),(1,'trainers'), (1,'jumper'),(2,'tie'),(2,'shoes');How do we find the most frequent product for each branch, including ties? Join the intermediate table of grouped counts to itself on matching branches, non-matching stock and not-greater counts in one than in the other: SELECT DISTINCT a.* FROM ( SELECT branch_id, stock_item, COUNT(*) qty FROM branch_stock GROUP BY branch_id, stock_item ) a JOIN ( SELECT branch_id, stock_item, COUNT(*) qty FROM branch_stock GROUP BY branch_id, stock_item ) b ON b.branch_id=a.branch_id AND b.stock_item<>a.stock_item AND b.qty<=a.qty; +-----------+------------+-----+ | branch_id | stock_item | qty | +-----------+------------+-----+ | 1 | trainers | 2 | | 2 | shoes | 1 | | 2 | tie | 1 | +-----------+------------+-----+5. Ordered subquery solution: A "trick" solution, made possible by MySQL's non-standard tolerance of GROUP BY when there is no aggregating SELECT expression, uses ORDER BY in a subquery to find the lowest prices, and GROUP BY in the outer query to pick them off:
SELECT * FROM ( SELECT * FROM products ORDER BY price ASC ) AS s GROUP BY item;It's succinct, it's fast if the query engine can find an index to order on, and it retrieves the prices and associated values in one step, but this query is seldom much faster than the FROM clause subquery described above.
The self-exclusion join and 6. Min-Concat-trick solution: Finally, here is a radically different model of the problem. It can find both within-group minima and within-group maxima in a single query. The model aggregates the concatenated within-group grouped column value and the within-group grouping column name in a single string, then uses SELECT item, SUBSTR( MIN( CONCAT( LPAD(price,6,0),supplier) ), 7) AS MinSupplier, LEFT( MIN( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MinPrice, SUBSTR( MAX( CONCAT( LPAD(price,6,0),supplier) ), 7) AS MaxSupplier, LEFT( MAX( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MaxPrice FROM products GROUP BY item; +------+-------------+----------+-------------+----------+ | item | MinSupplier | MinPrice | MaxSupplier | MaxPrice | +------+-------------+----------+-------------+----------+ | 1 | 1 | 10 | 2 | 15 | | 2 | 2 | 18 | 1 | 21 | +------+-------------+----------+-------------+----------+Try all solutions to find which is fastest for your version of the problem. To find the top or bottom N per group, you might think the Last updated 16 Aug 2024 |
![]() |