Join or subquery?
Usually, a JOIN
is faster than an uncorrelated subquery. For example in the sakila test database, customer is a parent of rental (via customer_id) which in turn is a parent of payment (via rental_id). The subquery version of a query for whether a customer has made payments and rentals...
SELECT DISTINCT c.customer_id
FROM customer c
WHERE c.customer_id IN (
SELECT r.customer_id
FROM rental r
JOIN payment p USING (rental_id)
WHERE c.customer_id = 599;
);
is eight times slower than the join version...
SELECT DISTINCT c.customer_id
FROM customer c
JOIN rental r USING (customer_id)
JOIN payment p USING (rental_id)
WHERE c.customer_id = 599;
Running EXPLAIN
on the two queries reveals why: the subquery version has to read most customer rows, while the join version proceeds inside out and discovers it needs to read just one customer row.