Aggregation, even when it is across multiple joins, can perform well when each table has a covering index for columns cited in the JOIN , GROUP BY and WHERE clauses, in that order.
For example, if your web pages form a tree and if you keep a visit log, you'll soon find yourself writing a query for last month's most popular pages:
SELECT v.id AS ID, i.name AS Item, Page, count(*) AS N
FROM visitlog v
JOIN items i USING(id)
LEFT JOIN items p on i.id=p.parentid
WHERE Date(v.dt) >= CurDate() - Interval 30 Day AND p.parentid IS NULL
GROUP BY id, name, page
ORDER BY n desc LIMIT 20;
Even if you average just 20 total item hits an hour, your visit log will have 100,000 rows in little more than half a year. With just 100 items on your site, without a visitlog index on (id, page, ip, dt) , the query engine will have to read 100,000*100*100=1 billion data rows. That will take a painfully long time. With such a visitlog index, though, and with itens indexes on id and parentid , query performance will be snappy.
When the query is too complex for such solutions, build a single OLAP-style fact table, index the GROUP BY columns, and query that table. Last updated 16 Aug 2019 |
 |