Find Median from Data Stream — Two-Heaps Pattern
Solve Find Median from Data Stream with two heaps. Learn the balance invariant, why it gives O(log n) inserts and O(1) median, and the common pitfalls.
What you'll learn
- ✓Why a single sorted structure is too slow for a stream
- ✓The two-heap setup and its balance invariant
- ✓How to handle even versus odd counts cleanly
- ✓Pitfalls with Python heapq and negation for max-heap
- ✓Related problems where this pattern reappears
Prerequisites
- •Familiarity with heaps and Big-O
This is the classic “design” interview question. Numbers arrive one by one, and at any moment you must return the median.
The Problem
Design a class with two methods:
addNum(int num): adds an integer to the stream.findMedian(): returns the median of all numbers seen so far.
If the count is odd, the median is the middle element. If even, it is the average of the two middle elements.
Intuition
A sorted list gives O(1) median lookup but O(n) insertion. An unsorted list gives O(1) insertion but O(n log n) median (or O(n) quickselect). Neither scales to millions of inserts.
We want O(log n) insert and O(1) median. The trick is two heaps. Split the numbers into two halves: lower is a max-heap holding the smaller half, upper is a min-heap holding the larger half. Maintain two invariants: every number in lower is <= every number in upper, and the sizes differ by at most 1 (with lower allowed to be the larger one). Then the median is either the top of lower (odd total) or the average of the two tops (even total).
The addNum trick: push to lower, then move the top of lower to upper. This guarantees the ordering invariant. Then rebalance if upper got too big.
Explanation with Example
Insert 5, 2, 9, 1 in sequence.
- Add 5.
lower=[-5], move 5 toupper.upper=[5].upperbigger, move back.lower=[-5]. Median: 5. - Add 2. Push -2 to
lower.lower=[-5,-2]. Move 5 toupper. Nowlower=[-2],upper=[5]. Median: (2+5)/2 = 3.5. - Add 9. Push -9 to
lower. Move 9 toupper.upper=[5,9].upperbigger, move 5 back.lower=[-5,-2],upper=[9]. Median: 5. - Add 1. Push -1 to
lower. Move 5 toupper.lower=[-2,-1],upper=[5,9]. Median: (2+5)/2 = 3.5.
lower (max-heap) | upper (min-heap)
|
after 5: [ 5 ] | [ ] med = 5
after 2: [ 2 ] | [ 5 ] med = 3.5
after 9: [ 5, 2 ] | [ 9 ] med = 5
after 1: [ 2, 1 ] | [ 5, 9 ] med = 3.5
^ ^
top of lower top of upper Code
Python’s heapq is a min-heap only. To simulate a max-heap, push negated values.
import heapq
class MedianFinder:
def __init__(self):
self.lower = [] # max-heap (negated)
self.upper = [] # min-heap
def addNum(self, num):
heapq.heappush(self.lower, -num)
heapq.heappush(self.upper, -heapq.heappop(self.lower))
if len(self.upper) > len(self.lower):
heapq.heappush(self.lower, -heapq.heappop(self.upper))
def findMedian(self):
if len(self.lower) > len(self.upper):
return -self.lower[0]
return (-self.lower[0] + self.upper[0]) / 2class MedianFinder {
private PriorityQueue<Integer> lower = new PriorityQueue<>(Collections.reverseOrder());
private PriorityQueue<Integer> upper = new PriorityQueue<>();
public void addNum(int num) {
lower.offer(num);
upper.offer(lower.poll());
if (upper.size() > lower.size()) lower.offer(upper.poll());
}
public double findMedian() {
if (lower.size() > upper.size()) return lower.peek();
return (lower.peek() + upper.peek()) / 2.0;
}
}class MedianFinder {
priority_queue<int> lower; // max-heap
priority_queue<int, vector<int>, greater<int>> upper; // min-heap
public:
void addNum(int num) {
lower.push(num);
upper.push(lower.top()); lower.pop();
if (upper.size() > lower.size()) {
lower.push(upper.top()); upper.pop();
}
}
double findMedian() {
if (lower.size() > upper.size()) return lower.top();
return (lower.top() + upper.top()) / 2.0;
}
};Time: O(log n) per insert, O(1) per query. Space: O(n).
Edge Cases / Pitfalls
- Forgetting the rebalance step. If you only push without moving,
lowerandupperget out of order. - Wrong sign on the Python max-heap. Always negate on push and on read.
- Returning an int when even-count requires a float. In Python
/is fine; in other languages, watch the integer division. - Off-by-one in the size check. Pick a convention (here,
lowercarries the extra) and stick to it.
Complexity Analysis
addNum: O(log n) for the heap operations.findMedian: O(1) — peek at the heap tops.- Space: O(n) for all numbers across the two heaps.
Follow-ups
Sliding window median: same two-heap idea, but you must remove an element that may not be on top of either heap. Lazy deletion via a counter of “to-be-removed” indices is the standard fix.
Percentiles: choose the size ratio of the heaps to put the boundary at the percentile you want. For online quantile estimation at massive scale, look at t-digest or HDR Histogram.
Interview talking points
- State the brute-force baselines first to motivate the design.
- Name the two invariants out loud. They are the proof that the algorithm is correct.
- Mention the Python max-heap workaround if you are coding in Python.
- Bring up the sliding-window variant unprompted — interviewers love when you anticipate the follow-up.
This is one of those problems where the invariant is the algorithm. Once you state “lower max-heap of smaller half, upper min-heap of larger half, sizes within one,” the code almost writes itself.
Related articles
- DSA Meeting Rooms II — Heap and Sweep Line
Solve Meeting Rooms II two ways: a min-heap of end times and a sweep-line over start and end events. Learn when each shines and how to pick in interviews.
- DSA LRU Cache — Hash Map + Doubly Linked List
Design LRU Cache with a hash map and a doubly linked list to get O(1) get and put, with a clean implementation and interview tips.
- DSA Find Minimum in Rotated Sorted Array — Binary Search
Solve Find Minimum in Rotated Sorted Array in logarithmic time with a modified binary search. Learn the pivot-finding invariant, edge cases, and how to extend the pattern to related problems.
- DSA Fizz Buzz — A Clean, Extensible Solution
Solve Fizz Buzz with clean code and explore the string-concatenation pattern that avoids nested if-else. Python, Java, C++, and complexity analysis included.