Python Generators and the yield Keyword
Understand Python generators from the ground up — how yield turns a function into an iterator, lazy evaluation, generator expressions, and when generators beat lists.
What you'll learn
- ✓What a generator is and how yield works
- ✓How generators differ from regular functions
- ✓Generator expressions vs list comprehensions
- ✓Streaming large data lazily
- ✓Composing generators into pipelines
Prerequisites
- •Basic Python familiarity
Generators are one of the cleanest ideas in Python. They let you produce a sequence of values one at a time, on demand, without holding the whole sequence in memory. The mechanism is a single keyword — yield — but the implications run deep, especially when you’re working with large data or streaming sources.
The Problem Generators Solve
Imagine you want to read a 10 GB log file and count error lines. The naive approach loads everything into memory.
def load_lines(path):
with open(path) as f:
return f.readlines() # returns a list of every line
This will exhaust your RAM. A generator gives you the same interface — iterate line by line — without ever holding more than one line at a time.
yield Turns a Function Into an Iterator
When Python sees yield inside a function, the function no longer runs to completion when called. Instead, calling it returns a generator object. Each time you advance the generator (with next() or a for loop), the function runs until the next yield, hands back a value, and pauses.
def counter():
yield 1
yield 2
yield 3
gen = counter()
print(next(gen)) # 1
print(next(gen)) # 2
print(next(gen)) # 3
Once the function returns (or runs off the end), the generator raises StopIteration, which is what for loops use to know they are done.
A Practical Streaming Example
Here is a generator that yields lines from a file lazily.
def lines_of(path):
with open(path) as f:
for line in f:
yield line.rstrip("\n")
errors = sum(1 for line in lines_of("server.log") if "ERROR" in line)
print(errors)
Memory usage stays flat regardless of file size. This pattern — generator producer, comprehension consumer — is everywhere in idiomatic Python.
Generator Expressions
A generator expression looks like a list comprehension but with parentheses instead of square brackets. It produces a generator object instead of a list.
squares = (x * x for x in range(1_000_000))
total = sum(squares)
If you wrote [x * x for x in range(1_000_000)] instead, Python would materialise a million-element list. The generator version computes one square at a time and lets sum consume them.
A nice shortcut: when a generator expression is the only argument to a function, you can drop the parentheses.
total = sum(x * x for x in range(1_000_000))
Composing Generators
Generators chain naturally. Each one transforms a stream of values without buffering.
def numbers():
n = 0
while True:
yield n
n += 1
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
def squared(stream):
for n in stream:
yield n * n
pipeline = squared(evens(numbers()))
for _ in range(5):
print(next(pipeline))
You have just built an infinite pipeline that costs almost nothing. This is the same model that powers Unix pipes — small, composable transformers connected end to end.
yield from
When one generator delegates to another, yield from is cleaner than a loop.
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
print(list(flatten([1, [2, [3, 4]], 5]))) # [1, 2, 3, 4, 5]
yield from sub is roughly equivalent to for x in sub: yield x, but shorter and slightly faster.
State Between Yields
Local variables in a generator persist across yield points. That makes generators a natural fit for stateful iteration.
def running_average():
total = 0
count = 0
for value in iter(int, None):
# never reached — placeholder
pass
def running_average(values):
total = 0
count = 0
for v in values:
total += v
count += 1
yield total / count
for avg in running_average([10, 20, 30, 40]):
print(avg)
Each call to next() resumes inside the loop, updates the locals, and pauses again at yield.
When to Use a Generator
Reach for a generator when:
- The data is large or unbounded.
- You only need to iterate once.
- You want to compose transformations cheaply.
- You are writing producer/consumer code where the consumer controls the pace.
Reach for a list when:
- You need random access by index.
- You need to iterate the same data multiple times.
- The dataset is small and you want a concrete snapshot.
Don’t Confuse Generators with Lists
A common bug: trying to iterate a generator twice.
gen = (x * x for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — generator is exhausted
If you need to iterate twice, convert to a list, or build the generator from a function so you can call it again.
Wrapping Up
Generators are how Python handles streams of data without losing its straightforward syntax. Master yield, generator expressions, and yield from, and you can write code that processes terabytes of data with the same shape as the toy examples in this article.
Related articles
- Python Generators and Iterators in Python
A practical guide to Python iterators and generators — the iterator protocol, yield, generator expressions, memory benefits, and how to model infinite sequences.
- JavaScript JavaScript Generators and Iterators
A practical guide to JavaScript iterators and generator functions: the protocols, lazy sequences, async generators, and where they shine in real code.
- Python Python asyncio Event Loop Guide
Understand how Python's asyncio event loop schedules coroutines, what await actually does, and how to avoid the classic mistakes that turn async code into a tangle of bugs.
- Python Python Decorators Deep Dive
A practical tour of Python decorators: how they work under the hood, when to use them, and how to write decorators that preserve metadata, accept arguments, and stack cleanly.