Topological Sort: Two Approaches Compared

Intermediate 9 min read

What you'll learn

✓What a topological ordering means for a DAG
✓Kahn's algorithm using in-degree counts and a queue
✓The DFS postorder approach
✓How both algorithms detect cycles
✓Common applications: build systems, scheduling, deps

Prerequisites

•Graphs basics from /blog/graphs-introduction
•BFS and DFS from /blog/graphs-bfs-and-dfs
•Big O from /blog/big-o-notation-explained

A topological sort is a linear ordering of the nodes of a directed acyclic graph such that every edge u to v appears with u before v. Topological orderings exist if and only if the graph has no cycles, which makes the algorithm useful both for ordering tasks and for detecting cycles.

Where It Shows Up

Build systems like make, bazel, and webpack toposort module dependencies to decide compile order. Spreadsheet engines toposort cell references to know which cells to recompute when one changes. Schedulers in CI systems and workflow engines toposort job graphs. Course prerequisite chains in catalogs are toposorts at heart.

There can be many valid orderings for the same DAG. For example, if A points to C and B points to C, both A B C and B A C are valid. The algorithm returns one of them.

Approach 1: Kahn’s Algorithm

Kahn’s algorithm uses in-degree counts. The in-degree of a node is the number of edges pointing into it. We start by enqueueing every node with in-degree zero, because those nodes have no prerequisites. We then repeatedly pop a node, append it to the output, and decrement the in-degree of each neighbor. If a neighbor’s in-degree drops to zero, it joins the queue.

from collections import deque, defaultdict

def kahn(graph, num_nodes):
    in_degree = [0] * num_nodes
    for u in graph:
        for v in graph[u]:
            in_degree[v] += 1

    q = deque(i for i in range(num_nodes) if in_degree[i] == 0)
    order = []
    while q:
        u = q.popleft()
        order.append(u)
        for v in graph[u]:
            in_degree[v] -= 1
            if in_degree[v] == 0:
                q.append(v)

    if len(order) != num_nodes:
        raise ValueError("Cycle detected")
    return order

graph = defaultdict(list)
graph[5] = [2, 0]
graph[4] = [0, 1]
graph[2] = [3]
graph[3] = [1]
print(kahn(graph, 6))

If the queue empties before we have visited every node, some nodes still have positive in-degree, which means they sit on a cycle. Kahn’s algorithm therefore doubles as a cycle detector.

Approach 2: DFS Postorder

The DFS approach runs a depth-first search and pushes each node onto an output stack after all its descendants have been fully explored. Reversing the stack yields a topological order.

def dfs_topo(graph, num_nodes):
    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * num_nodes
    order = []

    def visit(u):
        if color[u] == GRAY:
            raise ValueError("Cycle detected")
        if color[u] == BLACK:
            return
        color[u] = GRAY
        for v in graph.get(u, []):
            visit(v)
        color[u] = BLACK
        order.append(u)

    for u in range(num_nodes):
        if color[u] == WHITE:
            visit(u)
    return order[::-1]

The three-color scheme detects back edges. A back edge points to a node currently being explored, which signals a cycle. White means unvisited, gray means in progress, and black means fully processed.

Complexity

Both algorithms run in O(V + E) time and O(V) extra space. Each edge is examined once, and each node is enqueued or visited once. There is no logarithmic factor because no sorting or priority queue is involved.

For graphs with millions of nodes the constant factors of Kahn’s algorithm tend to be a bit better due to its iterative nature, which avoids recursion stack overhead. The DFS version is cleaner to write recursively but may need an explicit stack for very deep graphs to avoid hitting Python’s recursion limit.

Comparing the Two

Kahn’s algorithm produces a natural lexicographic-like order when you replace the queue with a priority queue, which is useful when you need deterministic or “smallest first” output. Build systems sometimes prefer this for reproducible compile orders.

The DFS approach is convenient when you are already doing DFS for other reasons, like checking strongly connected components or computing dominator trees. Tarjan’s strongly connected components algorithm, for example, naturally produces a reverse topological order of the condensation.

If you need to know whether a graph has a cycle and do not care about ordering, both algorithms work, but the DFS approach with the three-color trick is the most direct.

Multiple Sources and Determinism

When the DAG has many roots, the order in which you start matters for which valid ordering you produce. Kahn’s algorithm starts from every in-degree-zero node and processes them in queue order. If you want a unique canonical order, use a heap instead of a queue and process the smallest available node first. This gives the lex-smallest topological order.

import heapq

def lex_topo(graph, num_nodes):
    in_degree = [0] * num_nodes
    for u in graph:
        for v in graph[u]:
            in_degree[v] += 1
    pq = [i for i in range(num_nodes) if in_degree[i] == 0]
    heapq.heapify(pq)
    order = []
    while pq:
        u = heapq.heappop(pq)
        order.append(u)
        for v in graph.get(u, []):
            in_degree[v] -= 1
            if in_degree[v] == 0:
                heapq.heappush(pq, v)
    if len(order) != num_nodes:
        raise ValueError("Cycle detected")
    return order

This version costs O((V + E) log V) due to heap operations, but produces a canonical order.

Practical Tips

When dependency graphs are loaded from configuration, sanity-check them by toposorting before processing. A cycle that goes undetected leads to infinite loops or stack overflows downstream. The early cycle signal is worth the linear cost.

If your graph evolves dynamically, incremental topological maintenance algorithms exist, but the easiest pattern is to retoposort after each batch of edits.

For more graph background, see /blog/graphs-introduction and /blog/graphs-bfs-and-dfs.

Wrap up

Topological sort orders the nodes of a DAG so every edge points forward. Kahn’s algorithm uses in-degree counts and a queue, the DFS approach uses postorder, and both run in O(V + E) and detect cycles for free. Pick the style that fits the rest of your code.