Join or subquery?

from the Artful Common Queries page


Usually, a JOIN is faster than an uncorrelated subquery. For example in the sakila test database, customer is a parent (via customer_id) of rental, which in turn is a parent (via rental_id) of payment. The subquery version of a query for whether a customer has made payments and rentals ...
SELECT DISTINCT c.customer_id
FROM customer c
WHERE c.customer_id IN (
  SELECT r.customer_id 
  FROM rental r
  JOIN payment p USING (rental_id) 
  WHERE c.customer_id = 599
);
... before MySQL version 8.0, is eight times slower than the join version...
 
SELECT DISTINCT c.customer_id
FROM customer c
JOIN rental r USING (customer_id)
JOIN payment p USING (rental_id) 
WHERE c.customer_id = 599;
Running EXPLAIN on the two queries reveals why: the subquery version has to read most customer rows, while the join version proceeds inside out and discovers it needs to read just one customer row.

As of version 8.0, with its improved subquery and semi-join optimisations, the join version is only twice as fast.

Last updated 25 Jan 2025