SQL Joins Deep Dive: Inner, Outer, Cross, and Self
A thorough tour of SQL joins with diagrams, sample queries, and the gotchas that bite real systems: NULLs, duplicates, and join order.
What you'll learn
- ✓Inner vs outer joins
- ✓LEFT, RIGHT, and FULL semantics
- ✓CROSS join and self join uses
- ✓How NULLs interact with joins
- ✓Avoiding duplicate explosions
Prerequisites
- •Familiar with terminals and YAML
What and Why
Joins are how you combine rows from multiple tables based on a relationship. They are the single most important SQL concept after SELECT. Mastering joins is the difference between asking the database for what you want and fighting it for what you got.
Bad joins manifest as wrong totals, missing customers, duplicated rows in reports, and silent NULLs that quietly skew aggregates. Every analyst and every backend engineer eventually owns a query that lies because of a misunderstood join.
Mental Model
A INNER JOIN B -> rows where match exists in both
A LEFT JOIN B -> all of A, with B columns NULL when no match
A RIGHT JOIN B -> all of B, with A columns NULL when no match
A FULL JOIN B -> all of A and all of B, NULL where no match
A CROSS JOIN B -> every row of A paired with every row of B
(cartesian product, no ON clause) A self join is just a table joined to itself with an alias, useful for hierarchies and pairwise comparisons.
Hands-on Example
Two simple tables:
CREATE TABLE customers (
id bigint PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE orders (
id bigint PRIMARY KEY,
customer_id bigint REFERENCES customers(id),
total numeric(10,2)
);
INSERT INTO customers VALUES (1,'Alice'),(2,'Bob'),(3,'Cara');
INSERT INTO orders VALUES (10,1,50),(11,1,75),(12,2,20);
-- Cara has no orders. Order 99 below has no customer (orphan).
INSERT INTO orders VALUES (99,NULL,15);
INNER join: customers with at least one order.
SELECT c.name, o.id, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id;
-- Alice/10, Alice/11, Bob/12. Cara missing. Order 99 missing.
LEFT join: every customer, with their orders if any.
SELECT c.name, o.id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
ORDER BY c.name, o.id;
-- Alice/10, Alice/11, Bob/12, Cara/NULL/NULL
FULL join: every row from either side.
SELECT c.name, o.id, o.total
FROM customers c
FULL JOIN orders o ON o.customer_id = c.id;
-- Adds NULL/99/15 for the orphan order
Self join: find customers who placed multiple orders on the same day.
SELECT a.customer_id, a.id AS order_a, b.id AS order_b
FROM orders a
JOIN orders b
ON a.customer_id = b.customer_id
AND a.id < b.id;
CROSS join with a small helper to compute every (size, color) combination:
SELECT s.size, c.color
FROM (VALUES ('S'),('M'),('L')) AS s(size)
CROSS JOIN (VALUES ('red'),('blue')) AS c(color);
Common Pitfalls
- Duplicate explosion. Joining one customer to many orders inflates totals if you also
SUMfrom the customer side. Aggregate first, then join. WHEREafterLEFT JOINaccidentally converting to INNER. A predicate likeWHERE o.total > 0filters out the NULL rows from unmatched customers. Put it in theONclause instead.- NULL equality.
NULL = NULLis unknown, not true. Joins on nullable columns silently drop rows. UseIS NOT DISTINCT FROMif you really want NULL-matches. - Forgotten
ONbecomes CROSS. Some dialects acceptJOINwithoutON. The result is a cartesian product that looks fine for tiny tables and breaks production at scale. - Wrong join key cardinality. Joining
userstoeventson email when emails are not unique blows up counts. Verify keys withGROUP BY ... HAVING COUNT(*) > 1.
Production Tips
- Use explicit join syntax (
JOIN ... ON), never comma-separated tables withWHEREjoin conditions. Easier to read and review. - Always
EXPLAINqueries against production-shaped data. Watch for nested loops over large tables and missing indexes on join keys. - Add covering indexes on foreign keys. A
(customer_id, total)index lets the planner skip the heap for many join queries. - For analytical queries, prefer CTEs that aggregate before joining. Smaller intermediate sets mean smaller hash tables and fewer surprises.
- Document expected join cardinality next to the SQL:
-- one customer -> many orders. It catches future mistakes during review.
Wrap-up
Joins are simple in syntax and rich in semantics. Internalize the four shapes - inner, left/right, full, cross - watch out for NULLs, duplicates, and predicates leaking from WHERE into what should be ON, and verify cardinality on real data. Once you can predict a join’s output before you run it, you write queries that scale, report correctly, and survive review.
Related articles
- SQL SQL Stored Procedures vs Functions: Choosing Wisely
When to use stored procedures, when to use functions, and how transaction control, return values, and side effects differ across major databases.
- SQL SQL Aggregate Functions and GROUP BY in Depth
How aggregate functions interact with GROUP BY, HAVING, and window functions, with practical patterns and pitfalls every backend engineer should know.
- SQL CTE vs Subquery vs Temp Table: Picking the Right SQL Tool
Compare common table expressions, subqueries, and temporary tables in SQL. Learn when each shines, the performance trade-offs, and concrete examples.
- SQL SQL EXPLAIN ANALYZE Deep Dive
How to read EXPLAIN ANALYZE output, interpret cost estimates, spot bad plans, and use it to drive real performance improvements in production databases.