Compound JOINs

from the Artful SQL Server & Access Tips List


Look at the SQL below and determine, without running it in SQL Server, what the result set will look like.

This code uses the sample Northwind database, and to save you the trouble of looking them up, I'll remind you of the following statistics: the Customers table has 91 rows; the Orders table has 830 rows; seven Orders contain the CustomerID 'WOLZA'.

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers 
LEFT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.Customers.CustomerID = N'WOLZA'

How many rows would you expect this code to return? 7? 91? If you guessed either of these, you got it wrong. The query returns 97 rows.

Now try this one:

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers 
RIGHT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.Customers.CustomerID = N'WOLZA'

This query returns 830 rows, one for each row in the Orders table. All but seven contain a NULL CustomerID. To put it another way, the AND clause has no effect.

Looking at the SQL above, you might surmise that the AND clause is testing the wrong table's CustomerID column.

If you run the following query, you'll get exactly the same result—830 rows, seven of which contain the CustomerID 'WOLZA'. Thus, it makes no difference which table you specify in the AND clause.

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers 
RIGHT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.Orderss.CustomerID = N'WOLZA'

To get rid of the 823 rows containing a NULL, you could add a WHERE clause, like this:

SELECT dbo.Customers.CompanyName, dbo.Orders.OrderID, dbo.Orders.CustomerID
FROM dbo.Customers 
RIGHT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.Orders.CustomerID = N'WOLZA'
WHERE dbo.Customers.CustomerID IS NOT NULL

We finally get exactly seven rows. In fact, it turns out that there's another way to achieve the same thing: by adding a WHERE clause to test the CustomerID column:

SELECT dbo.Customers.CompanyName, dbo.Orders.OrderID, dbo.Orders.CustomerID
FROM dbo.Customers 
RIGHT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.Orders.CustomerID = N'WOLZA'
WHERE (dbo.Orders.CustomerID = N'WOLZA')

And yet, if all we were after in the first place was the seven orders corresponding to 'WOLZA', we could have obtained them with a simple INNER JOIN:

SELECT dbo.Customers.CompanyName, dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers 
INNER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID
WHERE (dbo.Customers.CustomerID = N'WOLZA')

or the logically equivalent exclusion join:

SELECT dbo.Customers.CustomerID, dbo.Orders.OrderID
FROM dbo.Customers 
LEFT OUTER JOIN dbo.Orders 
  ON dbo.Customers.CustomerID = dbo.Orders.CustomerID 
  AND dbo.customers.CustomerID = 'WOLZA' 
WHERE dbo.orders.customerid IS NOT NULL;

The point isn't that you should use INNER JOINs to get the list of seven WOLZA orders—or that OUTER JOINs are pointless (they often deliver exactly what we require). The point is that a compound OUTER JOIN becomes nonsensical when one of its clauses references only one of its tables and no WHERE clause disambiguates it. Think about that as a general principle.

Return to the Artful SQL Server & Access Tips page