LeetCode Clone Graph: DFS with a Hash Map
LeetCode 133 walked through — why a hash map from original to copy is the entire idea, plus the BFS variant for stack-limited environments.
What you'll learn
- ✓Why a node-to-copy hash map is the cornerstone
- ✓How DFS doubles as the recursion and the visited set
- ✓When the BFS version is preferable
- ✓How this generalizes to deep-copy problems
Prerequisites
- •Graph basics — see Graphs: Introduction
- •Hash maps — see Hashing and Hash Maps
LeetCode 133 — Clone Graph — is what happens when someone tries to make “implement deep copy” sound like an algorithm question. It is, but only barely: the entire problem reduces to one decision — how do you keep track of what you have already cloned? A hash map keyed on the original node is the only sane answer, and once that is in place the rest is a textbook traversal.
The Problem
You are given a reference to a node in a connected undirected graph. Each node has a value and a list of neighbors. Return a deep copy of the graph: every node must be a new object, the edge structure must match, and no part of the new graph may share references with the original.
Brute Force
A first attempt without a map is to clone nodes greedily. The instant you hit a back edge — and cycles guarantee you will — you either loop forever or create duplicate copies of the same logical node. The “brute” version is really just the broken version.
A slightly more honest brute force serializes the graph into an edge list, builds a fresh adjacency list with new node objects, and rewires them. Correct, but it touches each edge twice and requires labeling nodes (the problem gives you values but does not promise uniqueness in general).
Optimal Solution
Maintain a dictionary clones from original node to its copy. When you visit a node:
- If it is already in the map, return the existing copy.
- Otherwise, create a new node, store it immediately (before recursing), then recurse into the neighbors and fill in the copy’s neighbor list.
Storing the new node before recursing is the load-bearing step. It breaks cycles by ensuring that when a recursive call comes back around to the same original node, the map already has an answer.
def cloneGraph(node):
if not node:
return None
clones = {}
def dfs(orig):
if orig in clones:
return clones[orig]
copy = Node(orig.val, [])
clones[orig] = copy
for nbr in orig.neighbors:
copy.neighbors.append(dfs(nbr))
return copy
return dfs(node)
The BFS variant is equivalent: pop a node from the queue, create its copy if absent, enqueue any uncloned neighbors, and stitch.
from collections import deque
def cloneGraph(node):
if not node:
return None
clones = {node: Node(node.val, [])}
q = deque([node])
while q:
orig = q.popleft()
for nbr in orig.neighbors:
if nbr not in clones:
clones[nbr] = Node(nbr.val, [])
q.append(nbr)
clones[orig].neighbors.append(clones[nbr])
return clones[node]
Walk Through an Example
A 4-node cycle: 1 - 2 - 3 - 4 - 1 with the obvious cross edges between 2-4.
- Start DFS at 1. Create copy 1’. Store
1 -> 1'. Recurse into neighbor 2. - Create copy 2’. Store
2 -> 2'. Recurse into 1 (already mapped, return 1’), into 3, into 4. - 3 creates 3’, stores, recurses into 2 (mapped) and 4.
- 4 creates 4’, stores, recurses into 1 (mapped), 3 (mapped).
Every edge gets walked once per side, but every node is created exactly once. The map keeps the work bounded.
Edge Cases
- Empty input (null node) — return null.
- A single isolated node — clone it with an empty neighbor list.
- Self-loops — handled. When you recurse from a node back to itself, the map already has the copy.
- Disconnected components — the problem assumes connectivity. If real interview clarification reveals otherwise, you would need to iterate over all entry points.
- Multigraphs — neighbors may repeat. The hash map keeps node identity correct; duplicate edges become duplicate references, which matches the input.
Complexity Analysis
Let n be the number of nodes and m the number of edges. Time is O(n + m) — every node and every edge is processed a constant number of times. Space is O(n) for the clone map plus O(n) for the recursion stack in DFS or the queue in BFS.
If the recursion stack matters (deep graphs, sandboxed runtimes), pick BFS. Otherwise pick whichever you can write fastest. For the underlying complexity framing, see Big-O Notation Explained.
How to Explain It in an Interview
Lead with the cycle hazard. “Without memoization I will either loop forever or create duplicates, so I need a map from original node to its copy.” Then walk through the two-line ordering: create the copy, store it in the map, then recurse. State explicitly that the early store is what breaks cycles.
If the interviewer asks “what changes if the graph is directed?” — nothing, the same DFS works. If they ask “what changes if node values are unique?” — you could key the map on val, but keying on identity (id or the node object itself) is safer because uniqueness is a property of the input, not the algorithm.
Related Problems
- LeetCode 138 — Copy List with Random Pointer (same map-from-original idea)
- LeetCode 1490 — Clone N-ary Tree
- LeetCode 200 — Number of Islands (BFS/DFS on a grid; the BFS skeleton transfers directly)
- LeetCode 785 — Is Graph Bipartite (graph traversal with state)
For more on building the BFS reflex that shows up in nearly all of these, see BFS and DFS.
Wrap up
Clone Graph is the canonical example of “use a hash map as your traversal memory.” Once you internalize the create-then-store-then-recurse order, you can solve every deep-copy variant in the catalog with a one-paragraph mental model. The interview signal here is small but reliable: candidates who reach for the map immediately have seen the pattern, and candidates who try to clone-as-they-go reveal they have not.