Optimise Group By

from the Artful MySQL Tips List

Aggregation, even when it is across multiple joins, can perform well when each table has a covering index for columns cited in the JOIN, GROUP BY and WHERE clauses, in that order. For example, if your web pages form a tree and if you keep a visit log, you'll soon find yourself writing a query for last month's most popular pages:
SELECT v.id AS ID, i.name AS Item, Page, count(*) AS N
FROM visitlog v 
JOIN items i USING(id) 
LEFT JOIN items p on i.id=p.parentid 
WHERE Date(v.dt) >= CurDate() - Interval 30 Day AND p.parentid IS NULL
GROUP BY id, name, page 
ORDER BY n desc LIMIT 20;
Even if you average just 20 total item hits an hour, your visit log will have 100,000 rows in little more than half a year. With just 100 items on your site, without a visitlog index on (id, page, ip, dt), the query engine will have to read 100,000*100*100=1 billion data rows. That will take a painfully long time. With such a visitlog index, though, and with itens indexes on id and parentid, query performance will be snappy.

When the query is too complex for such solutions, build a single OLAP-style fact table, index the GROUP BY columns, and query that table.

Last updated 16 Aug 2019

Return to the Artful MySQL Tips page