Many questions on database newsgroups and forums can be answered with uses of outer joins. Outer joins are part of the standard SQL language and supported by all RDBMS brands. Many programmers are expected to use SQL in their work, but few know how to use outer joins effectively.
Learn to use this powerful feature of SQL, increase your employability, and amaze your friends!
Karwin will explain outer joins, show examples, and demonstrate a Sudoku puzzle solver implemented in a single SQL query.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
SQL Outer Joins for Fun and Profit
1. SQL Outer Joins
for Fun and Profit
Bill Karwin
Proprietor/Chief Architect
bill@karwin.com
www.karwin.com
2. Introduction
n
n
n
Overview of SQL joins: inner and outer
Applications of outer joins
Solving Sudoku puzzles with outer joins
2006-07-27
OSCON 2006
2
3. Joins in SQL
n
Joins:
The SQL way to express relations between data
in tables
n Form a new row in the result set, from matching
rows in each joined table
n As fundamental to using a relational database as
a loop is in other programming languages
n
2006-07-27
OSCON 2006
3
4. Inner joins refresher
n
ANSI SQL-89 syntax:
SELECT ...
FROM products p, orders o
WHERE p.product_id = o.product_id;
n
ANSI SQL-92 syntax:
SELECT ...
FROM products p JOIN orders o
ON p.product_id = o.product_id;
2006-07-27
OSCON 2006
4
6. Inner join example
Query result set
product_id
Product
attributes
order_id
Order
attributes
Abc
$10.00
10
2006/2/1
Abc
$10.00
11
2006/3/10
Def
$5.00
9
2005/5/2
SELECT ...
FROM products p JOIN orders o
ON p.product_id = o.product_id;
2006-07-27
OSCON 2006
6
7. Outer joins
n
n
n
Returns all rows in one table, but only
matching rows in joined table.
Returns NULL where no row matches.
Not supported in SQL-89
SQL-92 syntax:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id;
2006-07-27
OSCON 2006
7
8. Types of outer joins
n
LEFT OUTER JOIN
Returns all rows from table on left.
Returns NULLs in columns of right
table where no row matches
n
RIGHT OUTER JOIN
Returns all rows from table on right.
Returns NULLs in columns of left
table where no row matches.
n
FULL OUTER JOIN
Returns all rows from both tables.
Returns NULLs in columns of each,
where no row matches.
2006-07-27
OSCON 2006
8
9. Support for OUTER JOIN
Open-source RDBMS products:
Hypersonic
HSQLDB
PostgreSQL
LEFT
OUTER
JOIN
ü
ü
ü ü ü ü ü
RIGHT
OUTER
JOIN
ü
ü
ü ü ü ü ü
ü
ü ü
2006-07-27
SQLite
Ingres
R3
MySQL
FULL
OUTER
JOIN
Firebird
Apache
Derby
OSCON 2006
ü
9
11. Outer join example
Query result set
product_id
Product
attributes
order_id
Order
attributes
Abc
$10.00
10
2006/2/1
Abc
$10.00
11
2006/3/10
Def
$5.00
9
2005/5/2
Efg
$17.00
NULL
NULL
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id;
2006-07-27
OSCON 2006
11
12. So what?
n
n
n
Difference seems trivial and uninteresting
SQL works with sets and relations
Operations on sets combine in powerful
ways (just like operations on numbers,
strings, or booleans)
INNER JOIN
2006-07-27
LEFT
OUTER JOIN
RIGHT
OUTER JOIN
OSCON 2006
FULL
OUTER JOIN
12
13. Solutions using outer joins
n
n
n
n
Extra join
conditions
Subtotals per day
Localization
Mimic
n
n
(entity-attribute-value)
n
NOT IN (subquery)
n
Top three per group
Finding attributes in
EAV tables
Sudoku puzzle
solver
Greatest row per
group
2006-07-27
OSCON 2006
13
14. Extra join conditions
n
n
Problem: match only with orders created
this year.
Put extra conditions on the outer table into
the ON clause. This applies the conditions
before the join:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
AND o.date >= '2006-01-01';
2006-07-27
OSCON 2006
14
16. Extra join conditions
Query result set
product_id
Product
attributes
order_id
Order
attributes
Abc
$10.00
10
2006/2/1
Abc
$10.00
11
2006/3/10
Def
$5.00
NULL
NULL
Efg
$17.00
NULL
NULL
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
AND o.date >= '2006-01-01';
2006-07-27
OSCON 2006
16
17. Subtotals per day
n
n
Problem: show all days, and the subtotal of
orders per day even when there are zero.
Requires an additional table containing all
dates in the desired range.
SELECT d.date, COUNT(o.order_id)
FROM days d
LEFT OUTER JOIN orders o
ON o.date = d.date
GROUP BY d.date;
2006-07-27
OSCON 2006
17
19. Subtotals per day
Query result set
date
2005/5/2
. . .
0
. . .
0
. . .
0
. . .
0
2006/2/1
1
0
. . .
0
. . .
0
. . .
0
2006/3/10
1
. . .
2006-07-27
1
. . .
SELECT d.date, COUNT(o.order_id)
FROM days d
LEFT OUTER JOIN orders o
ON o.date = d.date
GROUP BY d.date;
COUNT()
0
OSCON 2006
19
20. Localization
n
Problem: show translated messages, or in
default language if translation is not
available.
SELECT en.message_id,
COALESCE(sp.message, en.message)
FROM messages AS sp
RIGHT OUTER JOIN messages AS en
ON sp.message_id = en.message_id
AND sp.language = 'sp'
AND en.language = 'en';
n
COALESCE() returns its first non-null argument.
2006-07-27
OSCON 2006
20
23. Mimic NOT IN subquery
n
n
Problem: find rows for which there is no
match.
Often implemented using NOT IN (subquery):
SELECT ...
FROM products p
WHERE p.product_id NOT IN
(SELECT o.product_id FROM orders o)
2006-07-27
OSCON 2006
23
24. Mimic NOT IN subquery
n
Can also be implemented using an outer
join:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
WHERE o.product_id IS NULL;
n
Useful when subqueries are not supported
(e.g. MySQL 4.0)
2006-07-27
OSCON 2006
24
26. Mimic NOT IN subquery
Query result set
product_id
Product
attributes
order_id
Order
attributes
Efg
$17.00
NULL
NULL
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
WHERE o.product_id IS NULL;
2006-07-27
OSCON 2006
26
27. Greatest row per group
n
Problem: find the row in each group with
the greatest value in one column
SELECT ...
FROM products p JOIN orders o1
ON p.product_id = o1.product_id
LEFT OUTER JOIN orders o2
ON p.product_id = o2.product_id
AND o1.date < o2.date
WHERE o2.product_id IS NULL;
n
I.e., show the rows for which no other row
exists with a greater date and the same
product_id.
2006-07-27
OSCON 2006
27
29. Greatest row per group
Query result set
product_id
Product
attributes
order_id
Order
attributes
Abc
$10.00
11
2006/3/10
Def
$5.00
9
2005/5/2
SELECT ...
FROM products p JOIN orders o1
ON p.product_id = o1.product_id
LEFT OUTER JOIN orders o2
ON p.product_id = o2.product_id
AND o1.date < o2.date
WHERE o2.product_id IS NULL;
2006-07-27
OSCON 2006
29
30. Top three per group
n
Problem: list the largest three cities per US
state.
SELECT c.state, c.city_name, c.population
FROM cities AS c
LEFT JOIN cities AS c2 ON c.state = c2.state
AND c.population <= c2.population
GROUP BY c.state, c.city_name, c.population
HAVING COUNT(*) <= 3
ORDER BY c.state, c.population DESC;
n
I.e., show the cities for which the number of cities
with the same state and greater population is less
than or equal to three.
2006-07-27
OSCON 2006
30
31. Top three per group
Cities c2
Cities c
state
city_name
population
state
city_name
population
CA
Los Angeles
3485K
CA
Los Angeles
3485K
CA
San Diego
1110K
CA
San Diego
1110K
CA
San Jose
782K
CA
San Jose
782K
CA
San Francisco
724K
CA
San Francisco
724K
2006-07-27
OSCON 2006
31
32. Top three per group
Query result set
state
city_name
population
CA
Los Angeles
3485K
CA
San Diego
1110K
CA
San Jose
782K
SELECT c.state, c.city_name, c.population
FROM cities AS c
LEFT JOIN cities AS c2 ON c.state = c2.state
AND c.population <= c2.population
GROUP BY c.state, c.city_name, c.population
HAVING COUNT(*) <= 3
ORDER BY c.state, c.population DESC;
2006-07-27
OSCON 2006
32
33. Fetching EAV attributes
n
Entity-Attribute-Value table structure for
dynamic attributes
Not normalized schema design
n Lacks integrity enforcement
n Not scalable
n Nevertheless, EAV is used widely and is
sometimes the only solution when attributes
evolve quickly
n
2006-07-27
OSCON 2006
33
35. Fetching EAV attributes
n
Need an outer join per attribute:
SELECT p.product_id, media.value AS media, discs.value AS discs,
format.value AS format, length.value AS length
FROM products AS p
LEFT OUTER JOIN attributes AS media
ON p.product_id = media.product_id AND media.attribute = 'Media'
LEFT OUTER JOIN attributes AS discs
ON p.product_id = discs.product_id AND discs.attribute = 'Discs'
LEFT OUTER JOIN attributes AS format
ON p.product_id = format.product_id AND format.attribute = 'Format'
LEFT OUTER JOIN attributes AS length
ON p.product_id = length.product_id AND length.attribute = 'Length'
WHERE p.product_id = 'Abc';
2006-07-27
OSCON 2006
35
36. Fetching EAV attributes
Query result set
product_id
media
discs
Format
length
Abc
DVD
2
Widescreen
108 min.
SELECT p.product_id, media.value AS media, discs.value AS discs,
format.value AS format, length.value AS length
FROM products AS p
LEFT OUTER JOIN attributes AS media
ON p.product_id = media.product_id AND media.attribute = 'Media'
LEFT OUTER JOIN attributes AS discs
ON p.product_id = discs.product_id AND discs.attribute = 'Discs'
LEFT OUTER JOIN attributes AS format
ON p.product_id = format.product_id AND format.attribute = 'Format'
LEFT OUTER JOIN attributes AS length
ON p.product_id = length.product_id AND length.attribute = 'Length'
WHERE p.product_id = 'Abc';
2006-07-27
OSCON 2006
36
39. Showing puzzle state
SELECT GROUP_CONCAT(COALESCE(s.value, '_')
ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state`
FROM one_to_nine AS x
INNER JOIN one_to_nine AS y
+-------------------+
| Puzzle_state
|
LEFT OUTER JOIN sudoku AS s
+-------------------+
ON s.column = x.value
| _ _ _ _ _ 3 _ 5 1 |
AND s.row = y.value
| 1 4 _ _ 7 _ 6 _ _ |
| _ 8 5 9 _ _ 4 _ 2 |
GROUP BY y.value;
| _ _ 2 3 _ _ 1 _ 7 |
| 5 3 _ _ _ _ _ 6 _ |
| 9 _ _ 8 6 4 _ 2 _ |
| _ 5 _ 1 _ 2 _ 8 _ |
| 6 _ 7 5 _ _ _ 9 _ |
| _ _ _ _ _ 7 3 1 _ |
+-------------------+
2006-07-27
OSCON 2006
39
40. Revealing possible values
Cartesian product:
loop x over 1..9 columns,
SELECT x_loop.value AS x, y_loop.value AS y,
GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities 1..9 rows,
loop y over
FROM (one_to_nine AS x_loop
loop cell over 1..9 values
INNER JOIN one_to_nine AS y_loop
Is there any value already
INNER JOIN one_to_nine AS cell)
in the cell x, y ?
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
Does the value appear in
AND occupied.row = y_loop.value)
column x ?
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
Does the value appear
AND num_in_col.value = cell.value)
Does the value appear
in row y ?
LEFT OUTER JOIN sudoku AS num_in_row
in the sub-square
ON (num_in_row.row = y_loop.value
containing x, y ?
AND num_in_row.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
Select for cases
num_in_row.value, num_in_box.value) IS NULL
where all four
GROUP BY x_loop.value, y_loop.value
outer joins find
no matches
2006-07-27
OSCON 2006
40
41. Revealing singleton values
SELECT x_loop.value AS x, y_loop.value AS y,
cell.value AS possibilities
FROM (one_to_nine AS x_loop
INNER JOIN one_to_nine AS y_loop
INNER JOIN one_to_nine AS cell)
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
AND occupied.row = y_loop.value)
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
AND num_in_col.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_row
ON (num_in_row.row = y_loop.value
Limit the groups only to
AND num_in_row.value = cell.value)
those with one value
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
remaining
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
num_in_row.value, num_in_box.value) IS NULL
GROUP BY x_loop.value, y_loop.value
HAVING COUNT(*) = 1;
2006-07-27
OSCON 2006
41
42. Updating the puzzle
INSERT INTO sudoku (column, row, value)
SELECT x_loop.value AS x, y_loop.value AS y,
cell.value AS possibilities
FROM (one_to_nine AS x_loop
INNER JOIN one_to_nine AS y_loop
Insert these singletons back
INNER JOIN one_to_nine AS cell)
into the table,
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
then we can try again
AND occupied.row = y_loop.value)
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
AND num_in_col.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_row
ON (num_in_row.row = y_loop.value
AND num_in_row.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
num_in_row.value, num_in_box.value) IS NULL
GROUP BY x_loop.value, y_loop.value
HAVING COUNT(*) = 1;
2006-07-27
OSCON 2006
42
43. Finish
n
Outer joins are an indispensable part
of SQL programming.
Thank you!
2006-07-27
OSCON 2006
43