Heap and Priority Queue: The Data Structure Behind Top-K
Learn how heaps power priority queues, why heapq runs push and pop in O(log n), and how to solve classic Top-K and merge problems in Python.
What you'll learn
- ✓What a binary heap is and how it differs from a binary search tree
- ✓The heap property and how it enables O(log n) push and pop
- ✓How to use Python's heapq module for min-heaps and max-heaps
- ✓How to solve Kth largest and merge K sorted lists with heaps
- ✓When a priority queue is the right tool for a problem
Prerequisites
- •Arrays: [Arrays Introduction](/blog/arrays-introduction)
- •Big-O basics: [Big-O Notation Explained](/blog/big-o-notation-explained)
- •Binary trees: [Binary Trees Introduction](/blog/binary-trees-introduction)
A priority queue is an abstract data type that always lets you access the “most important” element next. The most efficient general-purpose implementation is the binary heap, a complete binary tree stored compactly in an array. If you have ever needed the K largest items, the next event to process in a simulation, or the cheapest edge in a graph algorithm, you have reached for a heap whether you realized it or not.
The heap property
A min-heap is a complete binary tree where every parent is less than or equal to its children. A max-heap flips the inequality. The root is always the minimum (or maximum) of the entire structure, which is why peeking at the top is O(1).
Because the tree is complete, you can store it in an array with index math instead of pointers:
parent(i) = (i - 1) // 2
left(i) = 2 * i + 1
right(i) = 2 * i + 2
Insertions append to the end and “sift up” until the heap property holds.
Removals swap the root with the last element, pop it, and “sift down”. Both
operations touch at most one path from root to leaf, which has length
log n in a complete tree. That gives the headline guarantee: O(log n)
push and pop, O(1) peek.
heapq in Python
Python ships a min-heap in the standard library as heapq. It does not
provide a dedicated class; instead the functions operate on a plain list.
import heapq
h = []
heapq.heappush(h, 5)
heapq.heappush(h, 1)
heapq.heappush(h, 3)
print(h[0]) # 1, the minimum
print(heapq.heappop(h)) # 1
print(heapq.heappop(h)) # 3
A few practical notes:
heapq.heapify(lst)turns an existing list into a heap in O(n), which is faster than n pushes.heapq.heappushpop(h, x)andheapq.heapreplace(h, x)are slightly faster than separate calls when you need both operations.- To build a max-heap, negate the values on the way in and out, or push
tuples like
(-priority, item). - For custom ordering, push tuples whose first element is the sort key. Break ties with a counter so equal priorities do not try to compare the payloads.
import heapq, itertools
counter = itertools.count()
pq = []
def push(priority, item):
heapq.heappush(pq, (priority, next(counter), item))
def pop():
return heapq.heappop(pq)[-1]
When to reach for a heap
Use a heap when you repeatedly need the minimum or maximum of a changing collection and you do not need to query arbitrary ranks. If you only need the smallest element once, sorting or a single linear scan is simpler. If you need to search by key, prefer a hash map; see Hashing and Hash Maps. For algorithms like Dijkstra’s shortest path or Prim’s MST, a heap is essentially mandatory and pairs naturally with the graph traversals from Graphs: BFS and DFS.
Problem 1: Kth largest element in an array
Given an integer array
numsand integerk, return the kth largest element.
The naive approach sorts the array in O(n log n) and indexes -k. A heap
gives O(n log k), which is much better when k is small relative to n.
The trick is to maintain a min-heap of size k. After scanning every element, the heap contains the k largest values, and its smallest entry, sitting at the root, is the kth largest overall.
import heapq
def find_kth_largest(nums, k):
heap = []
for x in nums:
heapq.heappush(heap, x)
if len(heap) > k:
heapq.heappop(heap)
return heap[0]
Each push and pop is O(log k) because the heap never grows past size k.
Across n elements that is O(n log k) time and O(k) space.
Problem 2: Merge K sorted lists
Given
ksorted linked lists, merge them into one sorted list.
A heap of size k is the canonical solution. Push the head of every list,
then repeatedly pop the smallest and push that node’s successor.
import heapq
def merge_k_lists(lists):
heap = []
for i, node in enumerate(lists):
if node:
heapq.heappush(heap, (node.val, i, node))
dummy = ListNode(0)
tail = dummy
while heap:
val, i, node = heapq.heappop(heap)
tail.next = node
tail = node
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
The index i is the tie-breaker so the heap never tries to compare
ListNode objects. With N total nodes across k lists, you perform
N push/pop pairs against a heap of size at most k, giving
O(N log k) time and O(k) auxiliary space.
Heap vs. sorted structures
It is tempting to use a sorted array or a balanced BST when a heap will do. Compare them:
- Sorted array: O(n) insert, O(1) min/max. Good only if updates are rare.
- Balanced BST: O(log n) for everything, including arbitrary lookups, but with larger constants and more code.
- Heap: O(log n) push/pop, O(1) peek, but no efficient search or deletion of arbitrary elements.
If your algorithm only consumes elements in priority order, the heap’s narrower interface is a feature, not a limitation. Many dynamic programming problems also reduce to picking the best partial answer next; see Dynamic Programming Introduction for how to recognize when greedy heap-driven choices are sound.
Common pitfalls
- Forgetting Python is a min-heap. Negate values for a max-heap.
- Comparing payloads. Always include a deterministic tie-breaker.
- Decrease-key.
heapqhas no built-in decrease-key. The standard workaround is “lazy deletion”: push the new priority and skip stale entries when they pop. - Building incrementally vs. heapify. Calling
heapifyonce on a list is O(n); doingnpushes is O(n log n). For batch construction, useheapify.
Wrap up
Heaps trade arbitrary lookup for a fast “next best” operation, and that trade is exactly what priority queues need. Memorize the O(log n) push and pop, the O(1) peek, the array layout, and the “min-heap of size k” pattern for Top-K problems. Once those are reflexive, problems like Kth largest, merge K lists, Dijkstra, and median maintenance stop feeling like separate tricks and start looking like the same shape.