Skip to content
C Codeloom
SQL

SQL Joins Deep Dive: Inner, Outer, Cross, and Self

A thorough tour of SQL joins with diagrams, sample queries, and the gotchas that bite real systems: NULLs, duplicates, and join order.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • Inner vs outer joins
  • LEFT, RIGHT, and FULL semantics
  • CROSS join and self join uses
  • How NULLs interact with joins
  • Avoiding duplicate explosions

Prerequisites

  • Familiar with terminals and YAML

What and Why

Joins are how you combine rows from multiple tables based on a relationship. They are the single most important SQL concept after SELECT. Mastering joins is the difference between asking the database for what you want and fighting it for what you got.

Bad joins manifest as wrong totals, missing customers, duplicated rows in reports, and silent NULLs that quietly skew aggregates. Every analyst and every backend engineer eventually owns a query that lies because of a misunderstood join.

Mental Model

A INNER JOIN B   ->  rows where match exists in both

A LEFT JOIN B    ->  all of A, with B columns NULL when no match

A RIGHT JOIN B   ->  all of B, with A columns NULL when no match

A FULL JOIN B    ->  all of A and all of B, NULL where no match

A CROSS JOIN B   ->  every row of A paired with every row of B
                   (cartesian product, no ON clause)
Four join shapes

A self join is just a table joined to itself with an alias, useful for hierarchies and pairwise comparisons.

Hands-on Example

Two simple tables:

CREATE TABLE customers (
  id   bigint PRIMARY KEY,
  name text NOT NULL
);

CREATE TABLE orders (
  id          bigint PRIMARY KEY,
  customer_id bigint REFERENCES customers(id),
  total       numeric(10,2)
);

INSERT INTO customers VALUES (1,'Alice'),(2,'Bob'),(3,'Cara');
INSERT INTO orders    VALUES (10,1,50),(11,1,75),(12,2,20);
-- Cara has no orders. Order 99 below has no customer (orphan).
INSERT INTO orders    VALUES (99,NULL,15);

INNER join: customers with at least one order.

SELECT c.name, o.id, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id;
-- Alice/10, Alice/11, Bob/12.  Cara missing.  Order 99 missing.

LEFT join: every customer, with their orders if any.

SELECT c.name, o.id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
ORDER BY c.name, o.id;
-- Alice/10, Alice/11, Bob/12, Cara/NULL/NULL

FULL join: every row from either side.

SELECT c.name, o.id, o.total
FROM customers c
FULL JOIN orders o ON o.customer_id = c.id;
-- Adds NULL/99/15 for the orphan order

Self join: find customers who placed multiple orders on the same day.

SELECT a.customer_id, a.id AS order_a, b.id AS order_b
FROM orders a
JOIN orders b
  ON a.customer_id = b.customer_id
 AND a.id < b.id;

CROSS join with a small helper to compute every (size, color) combination:

SELECT s.size, c.color
FROM (VALUES ('S'),('M'),('L')) AS s(size)
CROSS JOIN (VALUES ('red'),('blue')) AS c(color);

Common Pitfalls

  • Duplicate explosion. Joining one customer to many orders inflates totals if you also SUM from the customer side. Aggregate first, then join.
  • WHERE after LEFT JOIN accidentally converting to INNER. A predicate like WHERE o.total > 0 filters out the NULL rows from unmatched customers. Put it in the ON clause instead.
  • NULL equality. NULL = NULL is unknown, not true. Joins on nullable columns silently drop rows. Use IS NOT DISTINCT FROM if you really want NULL-matches.
  • Forgotten ON becomes CROSS. Some dialects accept JOIN without ON. The result is a cartesian product that looks fine for tiny tables and breaks production at scale.
  • Wrong join key cardinality. Joining users to events on email when emails are not unique blows up counts. Verify keys with GROUP BY ... HAVING COUNT(*) > 1.

Production Tips

  • Use explicit join syntax (JOIN ... ON), never comma-separated tables with WHERE join conditions. Easier to read and review.
  • Always EXPLAIN queries against production-shaped data. Watch for nested loops over large tables and missing indexes on join keys.
  • Add covering indexes on foreign keys. A (customer_id, total) index lets the planner skip the heap for many join queries.
  • For analytical queries, prefer CTEs that aggregate before joining. Smaller intermediate sets mean smaller hash tables and fewer surprises.
  • Document expected join cardinality next to the SQL: -- one customer -> many orders. It catches future mistakes during review.

Wrap-up

Joins are simple in syntax and rich in semantics. Internalize the four shapes - inner, left/right, full, cross - watch out for NULLs, duplicates, and predicates leaking from WHERE into what should be ON, and verify cardinality on real data. Once you can predict a join’s output before you run it, you write queries that scale, report correctly, and survive review.