Skip to content
C Codeloom
DSA

Find Median from Data Stream — Two-Heaps Pattern

Solve Find Median from Data Stream with two heaps. Learn the balance invariant, why it gives O(log n) inserts and O(1) median, and the common pitfalls.

·5 min read · By Codeloom
Advanced 10 min read

What you'll learn

  • Why a single sorted structure is too slow for a stream
  • The two-heap setup and its balance invariant
  • How to handle even versus odd counts cleanly
  • Pitfalls with Python heapq and negation for max-heap
  • Related problems where this pattern reappears

Prerequisites

  • Familiarity with heaps and Big-O

This is the classic “design” interview question. Numbers arrive one by one, and at any moment you must return the median.

The Problem

Design a class with two methods:

  • addNum(int num): adds an integer to the stream.
  • findMedian(): returns the median of all numbers seen so far.

If the count is odd, the median is the middle element. If even, it is the average of the two middle elements.

Intuition

A sorted list gives O(1) median lookup but O(n) insertion. An unsorted list gives O(1) insertion but O(n log n) median (or O(n) quickselect). Neither scales to millions of inserts.

We want O(log n) insert and O(1) median. The trick is two heaps. Split the numbers into two halves: lower is a max-heap holding the smaller half, upper is a min-heap holding the larger half. Maintain two invariants: every number in lower is <= every number in upper, and the sizes differ by at most 1 (with lower allowed to be the larger one). Then the median is either the top of lower (odd total) or the average of the two tops (even total).

The addNum trick: push to lower, then move the top of lower to upper. This guarantees the ordering invariant. Then rebalance if upper got too big.

Explanation with Example

Insert 5, 2, 9, 1 in sequence.

  1. Add 5. lower=[-5], move 5 to upper. upper=[5]. upper bigger, move back. lower=[-5]. Median: 5.
  2. Add 2. Push -2 to lower. lower=[-5,-2]. Move 5 to upper. Now lower=[-2], upper=[5]. Median: (2+5)/2 = 3.5.
  3. Add 9. Push -9 to lower. Move 9 to upper. upper=[5,9]. upper bigger, move 5 back. lower=[-5,-2], upper=[9]. Median: 5.
  4. Add 1. Push -1 to lower. Move 5 to upper. lower=[-2,-1], upper=[5,9]. Median: (2+5)/2 = 3.5.
           lower (max-heap)     |     upper (min-heap)
                              |
after 5:        [ 5 ]           |       [ ]            med = 5
after 2:        [ 2 ]           |       [ 5 ]          med = 3.5
after 9:        [ 5, 2 ]        |       [ 9 ]          med = 5
after 1:        [ 2, 1 ]        |       [ 5, 9 ]       med = 3.5
                ^                       ^
                top of lower            top of upper
Two heaps meet at the median boundary

Code

Python’s heapq is a min-heap only. To simulate a max-heap, push negated values.

import heapq

class MedianFinder:
    def __init__(self):
        self.lower = []   # max-heap (negated)
        self.upper = []   # min-heap

    def addNum(self, num):
        heapq.heappush(self.lower, -num)
        heapq.heappush(self.upper, -heapq.heappop(self.lower))
        if len(self.upper) > len(self.lower):
            heapq.heappush(self.lower, -heapq.heappop(self.upper))

    def findMedian(self):
        if len(self.lower) > len(self.upper):
            return -self.lower[0]
        return (-self.lower[0] + self.upper[0]) / 2
class MedianFinder {
    private PriorityQueue<Integer> lower = new PriorityQueue<>(Collections.reverseOrder());
    private PriorityQueue<Integer> upper = new PriorityQueue<>();

    public void addNum(int num) {
        lower.offer(num);
        upper.offer(lower.poll());
        if (upper.size() > lower.size()) lower.offer(upper.poll());
    }
    public double findMedian() {
        if (lower.size() > upper.size()) return lower.peek();
        return (lower.peek() + upper.peek()) / 2.0;
    }
}
class MedianFinder {
    priority_queue<int> lower; // max-heap
    priority_queue<int, vector<int>, greater<int>> upper; // min-heap
public:
    void addNum(int num) {
        lower.push(num);
        upper.push(lower.top()); lower.pop();
        if (upper.size() > lower.size()) {
            lower.push(upper.top()); upper.pop();
        }
    }
    double findMedian() {
        if (lower.size() > upper.size()) return lower.top();
        return (lower.top() + upper.top()) / 2.0;
    }
};

Time: O(log n) per insert, O(1) per query. Space: O(n).

Edge Cases / Pitfalls

  • Forgetting the rebalance step. If you only push without moving, lower and upper get out of order.
  • Wrong sign on the Python max-heap. Always negate on push and on read.
  • Returning an int when even-count requires a float. In Python / is fine; in other languages, watch the integer division.
  • Off-by-one in the size check. Pick a convention (here, lower carries the extra) and stick to it.

Complexity Analysis

  • addNum: O(log n) for the heap operations.
  • findMedian: O(1) — peek at the heap tops.
  • Space: O(n) for all numbers across the two heaps.

Follow-ups

Sliding window median: same two-heap idea, but you must remove an element that may not be on top of either heap. Lazy deletion via a counter of “to-be-removed” indices is the standard fix.

Percentiles: choose the size ratio of the heaps to put the boundary at the percentile you want. For online quantile estimation at massive scale, look at t-digest or HDR Histogram.

Interview talking points

  • State the brute-force baselines first to motivate the design.
  • Name the two invariants out loud. They are the proof that the algorithm is correct.
  • Mention the Python max-heap workaround if you are coding in Python.
  • Bring up the sliding-window variant unprompted — interviewers love when you anticipate the follow-up.

This is one of those problems where the invariant is the algorithm. Once you state “lower max-heap of smaller half, upper min-heap of larger half, sizes within one,” the code almost writes itself.