LeetCode Word Ladder: BFS on Word Graphs

Intermediate 10 min read

What you'll learn

✓How to build an implicit graph over words
✓Why BFS — not DFS — gives the shortest ladder
✓The wildcard-pattern preprocessing trick
✓How bidirectional BFS cuts work roughly in half (in exponent)

Prerequisites

•Graph basics — see Graphs: Introduction
•BFS comfort — see BFS and DFS

LeetCode 127 — Word Ladder — is the canonical example of a problem where the graph is not given to you. You have to see one. Words are nodes, single-letter substitutions are edges, and shortest path is BFS. The interesting design choice is how you enumerate neighbors fast enough to survive a large word list.

The Problem

Given two words beginWord and endWord and a dictionary wordList, return the length of the shortest transformation sequence from beginWord to endWord such that:

Only one letter changes per step.
Every intermediate word is in wordList.
beginWord does not need to be in wordList, but endWord does.

Return 0 if no such sequence exists. The length includes both endpoints.

Brute Force

For every node you visit, scan the entire word list and check which words differ by exactly one letter. With dictionary size N and word length L, neighbor generation costs O(N * L) per node. Total work climbs toward O(N^2 * L).

from collections import deque

def ladderLength(beginWord, endWord, wordList):
    words = set(wordList)
    if endWord not in words:
        return 0
    q = deque([(beginWord, 1)])
    visited = {beginWord}
    while q:
        word, dist = q.popleft()
        if word == endWord:
            return dist
        for cand in list(words):
            if sum(a != b for a, b in zip(word, cand)) == 1 and cand not in visited:
                visited.add(cand)
                q.append((cand, dist + 1))
    return 0

Correct, but it dies on the test cases LeetCode actually ships.

Optimal Solution

Precompute a map from wildcard pattern to words. For each word, replace each letter with * and put the word into the bucket for that pattern. Two words share a bucket exactly when they differ in one position.

from collections import deque, defaultdict

def ladderLength(beginWord, endWord, wordList):
    words = set(wordList)
    if endWord not in words:
        return 0
    L = len(beginWord)
    buckets = defaultdict(list)
    for w in words:
        for i in range(L):
            buckets[w[:i] + "*" + w[i+1:]].append(w)

    q = deque([(beginWord, 1)])
    visited = {beginWord}
    while q:
        word, dist = q.popleft()
        if word == endWord:
            return dist
        for i in range(L):
            pat = word[:i] + "*" + word[i+1:]
            for nbr in buckets[pat]:
                if nbr not in visited:
                    visited.add(nbr)
                    q.append((nbr, dist + 1))
            buckets[pat] = []  # avoid revisiting via this pattern
    return 0

Clearing the bucket after use is a small but real speed win — it prevents the same pattern from being revisited deeper in the BFS.

Walk Through an Example

Take beginWord = "hit", endWord = "cog", wordList = ["hot", "dot", "dog", "lot", "log", "cog"].

Build buckets: "*ot" -> ["hot", "dot", "lot"], "h*t" -> ["hot"], "do*" -> ["dot", "dog"], and so on.
BFS from hit with distance 1.
- Patterns for hit: "*it", "h*t", "hi*". The only hit is "h*t" -> "hot". Enqueue (hot, 2).
Pop hot. Patterns hit "hot", "dot", "lot". Enqueue (dot, 3), (lot, 3).
Pop dot. Patterns yield "dog". Enqueue (dog, 4).
Pop lot. Patterns yield "log". Enqueue (log, 4).
Pop dog. Patterns include "*og", which leads to "cog". Enqueue (cog, 5).
Pop log. "*og" bucket already cleared.
Pop cog. Match. Return 5.

Edge Cases

endWord not in dictionary — return 0 up front. This avoids wasted BFS.
beginWord == endWord — by problem convention, the answer is 1, though some interview variants treat this as 0. Clarify before coding.
Empty word list — return 0.
Multiple shortest ladders — return the length; the path itself is LeetCode 126.

Complexity Analysis

Let N be the dictionary size and L the word length. Bucket construction is O(N * L^2) — there are L patterns per word and each pattern is built in O(L). BFS visits each word at most once and inspects L patterns per pop, each touching a bucket of total size bounded by O(N). Time bound: O(N * L^2). Space is O(N * L^2) for the bucket map.

Bidirectional BFS — run BFS from both ends and stop when the frontiers meet — cuts the explored set roughly square root in size. Worth mentioning in interviews, though the implementation has more moving parts.

How to Explain It in an Interview

Start by saying the words form an implicit graph: edge if they differ by one letter, shortest path by BFS. Then volunteer the cost problem: “Neighbor generation is the bottleneck — naive scan is O(N) per node. I’ll preprocess wildcard patterns so neighbor lookup is O(L) per node.” Sketch the bucket map. Mention the bucket-clearing optimization.

If the interviewer asks “can you do better,” answer bidirectional BFS. If they ask about all shortest ladders (LeetCode 126), say the BFS becomes a layered graph and the reconstruction is the hard part.

LeetCode 126 — Word Ladder II (return every shortest sequence)
LeetCode 433 — Minimum Genetic Mutation (same structure, smaller alphabet)
LeetCode 752 — Open the Lock (BFS on strings with single-position changes)
LeetCode 815 — Bus Routes (BFS on a constructed graph)

The last one is a great follow-up because it forces you to design a different node type from the obvious one.

Wrap up

Word Ladder is the test of whether you can see the graph that is not drawn. The wildcard bucket is the tactical insight; the strategic insight is BFS for shortest path on unweighted graphs. Pair this problem with BFS and DFS and you have a complete answer to half of the graph problems on a typical interview list.