Skip to content
C Codeloom
DSA

Heap and Priority Queue: The Data Structure Behind Top-K

Learn how heaps power priority queues, why heapq runs push and pop in O(log n), and how to solve classic Top-K and merge problems in Python.

·6 min read · By Yash Kesharwani
Beginner 9 min read

What you'll learn

  • What a binary heap is and how it differs from a binary search tree
  • The heap property and how it enables O(log n) push and pop
  • How to use Python's heapq module for min-heaps and max-heaps
  • How to solve Kth largest and merge K sorted lists with heaps
  • When a priority queue is the right tool for a problem

Prerequisites

  • Arrays: [Arrays Introduction](/blog/arrays-introduction)
  • Big-O basics: [Big-O Notation Explained](/blog/big-o-notation-explained)
  • Binary trees: [Binary Trees Introduction](/blog/binary-trees-introduction)

A priority queue is an abstract data type that always lets you access the “most important” element next. The most efficient general-purpose implementation is the binary heap, a complete binary tree stored compactly in an array. If you have ever needed the K largest items, the next event to process in a simulation, or the cheapest edge in a graph algorithm, you have reached for a heap whether you realized it or not.

The heap property

A min-heap is a complete binary tree where every parent is less than or equal to its children. A max-heap flips the inequality. The root is always the minimum (or maximum) of the entire structure, which is why peeking at the top is O(1).

Because the tree is complete, you can store it in an array with index math instead of pointers:

parent(i) = (i - 1) // 2
left(i)   = 2 * i + 1
right(i)  = 2 * i + 2

Insertions append to the end and “sift up” until the heap property holds. Removals swap the root with the last element, pop it, and “sift down”. Both operations touch at most one path from root to leaf, which has length log n in a complete tree. That gives the headline guarantee: O(log n) push and pop, O(1) peek.

heapq in Python

Python ships a min-heap in the standard library as heapq. It does not provide a dedicated class; instead the functions operate on a plain list.

import heapq

h = []
heapq.heappush(h, 5)
heapq.heappush(h, 1)
heapq.heappush(h, 3)

print(h[0])              # 1, the minimum
print(heapq.heappop(h))  # 1
print(heapq.heappop(h))  # 3

A few practical notes:

  • heapq.heapify(lst) turns an existing list into a heap in O(n), which is faster than n pushes.
  • heapq.heappushpop(h, x) and heapq.heapreplace(h, x) are slightly faster than separate calls when you need both operations.
  • To build a max-heap, negate the values on the way in and out, or push tuples like (-priority, item).
  • For custom ordering, push tuples whose first element is the sort key. Break ties with a counter so equal priorities do not try to compare the payloads.
import heapq, itertools

counter = itertools.count()
pq = []

def push(priority, item):
    heapq.heappush(pq, (priority, next(counter), item))

def pop():
    return heapq.heappop(pq)[-1]

When to reach for a heap

Use a heap when you repeatedly need the minimum or maximum of a changing collection and you do not need to query arbitrary ranks. If you only need the smallest element once, sorting or a single linear scan is simpler. If you need to search by key, prefer a hash map; see Hashing and Hash Maps. For algorithms like Dijkstra’s shortest path or Prim’s MST, a heap is essentially mandatory and pairs naturally with the graph traversals from Graphs: BFS and DFS.

Problem 1: Kth largest element in an array

Given an integer array nums and integer k, return the kth largest element.

The naive approach sorts the array in O(n log n) and indexes -k. A heap gives O(n log k), which is much better when k is small relative to n.

The trick is to maintain a min-heap of size k. After scanning every element, the heap contains the k largest values, and its smallest entry, sitting at the root, is the kth largest overall.

import heapq

def find_kth_largest(nums, k):
    heap = []
    for x in nums:
        heapq.heappush(heap, x)
        if len(heap) > k:
            heapq.heappop(heap)
    return heap[0]

Each push and pop is O(log k) because the heap never grows past size k. Across n elements that is O(n log k) time and O(k) space.

Problem 2: Merge K sorted lists

Given k sorted linked lists, merge them into one sorted list.

A heap of size k is the canonical solution. Push the head of every list, then repeatedly pop the smallest and push that node’s successor.

import heapq

def merge_k_lists(lists):
    heap = []
    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))

    dummy = ListNode(0)
    tail = dummy
    while heap:
        val, i, node = heapq.heappop(heap)
        tail.next = node
        tail = node
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))
    return dummy.next

The index i is the tie-breaker so the heap never tries to compare ListNode objects. With N total nodes across k lists, you perform N push/pop pairs against a heap of size at most k, giving O(N log k) time and O(k) auxiliary space.

Heap vs. sorted structures

It is tempting to use a sorted array or a balanced BST when a heap will do. Compare them:

  • Sorted array: O(n) insert, O(1) min/max. Good only if updates are rare.
  • Balanced BST: O(log n) for everything, including arbitrary lookups, but with larger constants and more code.
  • Heap: O(log n) push/pop, O(1) peek, but no efficient search or deletion of arbitrary elements.

If your algorithm only consumes elements in priority order, the heap’s narrower interface is a feature, not a limitation. Many dynamic programming problems also reduce to picking the best partial answer next; see Dynamic Programming Introduction for how to recognize when greedy heap-driven choices are sound.

Common pitfalls

  • Forgetting Python is a min-heap. Negate values for a max-heap.
  • Comparing payloads. Always include a deterministic tie-breaker.
  • Decrease-key. heapq has no built-in decrease-key. The standard workaround is “lazy deletion”: push the new priority and skip stale entries when they pop.
  • Building incrementally vs. heapify. Calling heapify once on a list is O(n); doing n pushes is O(n log n). For batch construction, use heapify.

Wrap up

Heaps trade arbitrary lookup for a fast “next best” operation, and that trade is exactly what priority queues need. Memorize the O(log n) push and pop, the O(1) peek, the array layout, and the “min-heap of size k” pattern for Top-K problems. Once those are reflexive, problems like Kth largest, merge K lists, Dijkstra, and median maintenance stop feeling like separate tricks and start looking like the same shape.