This query...
against this table...
with 50,000 rows, was running once a minute and bringing down the hosting server. The hosting company did not appreciate this. The question was, how to optimise it: The query is more transparent if we write the joins explicitly:
The DISTINCT clause in the subquery is logically equivalent to GROUP BY . Performance will be better if the table has an index on `sid`, `title`, `path`, `url`, `hostname`. For reasons that will become clear in a moment, make that `path`, `sid`, `title`, `url`, `hostname`.The next thing to notice is that this is a self-join, so the `path` condition could be moved to the subquery, thereby reducing the number of row accesses the subquery requires; since the LIKE comparisons probably contribute the biggest performance drag, this might improve performance greatly:
You can satisfy yourself that this is the case by running EXPLAIN against each query (just type EXPLAIN followed by the query). The first row of that result will show that the engine needs to look at fewer rows when the `path` condition is moved inside the subquery. The LIKE conditions are probably consuming most resources, so the improvement you see will be close to (total rows)/(rows matching `path`).And now you see why the first column in the index should be `path`: MySQL can use that index to discard non-matching rows. We've got rid of the first element of the self-join, so we can lose it:
which no longer brings down the server. |