Optimise Group By
from the Artful MySQL Tips List
Aggregation, even when it is across multiple joins, can perform well when each table has a covering index for columns cited in the JOIN
, GROUP BY
and WHERE
clauses, in that order.
For example, if your web pages form a tree and if you keep a visit log, you'll soon find yourself writing a query for last month's most popular pages:
SELECT v.id AS ID, i.name AS Item, Page, count(*) AS N
FROM visitlog v
JOIN items i USING(id)
LEFT JOIN items p on i.id=p.parentid
WHERE Date(v.dt) >= CurDate() - Interval 30 Day AND p.parentid IS NULL
GROUP BY id, name, page
ORDER BY n desc LIMIT 20;
Even if you average just 20 total item hits an hour, your visit log will have 100,000 rows in little more than half a year. With just 100 items on your site, without a visitlog
index on (id, page, ip, dt)
, the query engine will have to read 100,000*100*100=1 billion data rows. That will take a painfully long time. With such a visitlog
index, though, and with itens
indexes on id
and parentid
, query performance will be snappy.
When the query is too complex for such solutions, build a single OLAP-style fact table, index the GROUP BY
columns, and query that table.
Return to the Artful MySQL Tips page