Generators and Iterators in Python
A practical guide to Python iterators and generators — the iterator protocol, yield, generator expressions, memory benefits, and how to model infinite sequences.
What you'll learn
- ✓The iterator protocol — __iter__ and __next__
- ✓How generator functions and yield work
- ✓Generator expressions and how they differ from list comprehensions
- ✓Why generators use almost no memory
- ✓How to model infinite sequences cleanly
- ✓When to reach for a generator and when not to
Every for loop in Python sits on top of the same small protocol. Understanding it unlocks generators — a tool for describing sequences lazily, computing each value only when it is asked for, and using almost no memory regardless of how long the sequence is. Generators are the right tool for streaming data, walking large files line by line, and modelling sequences that never end.
The iterator protocol
Python’s for loop does not magically know how to iterate over a list, a string, or a file. It uses two methods, by convention called the iterator protocol:
__iter__()— return an iterator object__next__()— return the next value, or raiseStopIterationwhen there are no more
A for x in something: loop is roughly equivalent to:
it = iter(something) # calls something.__iter__()
while True:
try:
x = next(it) # calls it.__next__()
except StopIteration:
break
# ... loop body ...
You can drive this manually with the built-ins iter and next:
nums = [10, 20, 30]
it = iter(nums)
print(next(it)) # 10
print(next(it)) # 20
print(next(it)) # 30
print(next(it)) # StopIteration
The distinction between an iterable (something that can produce an iterator) and an iterator (something that holds the position and produces values via next) is real and worth holding in your head. Lists are iterables; calling iter(list) gives you a fresh iterator over them.
Why this matters
Because Python’s for loop is built on this protocol, anything that implements it works in a for. You don’t need to build a list first. You don’t need to know the length ahead of time. You don’t even need the sequence to be finite. The loop just keeps calling next until StopIteration.
That gives you four important properties:
- Lazy — values are computed only when asked for
- One-pass by default — iterators are consumed as you go
- Memory-light — only the current value is held, not the whole sequence
- Composable — iterators can wrap other iterators
Generators are the easy way to write things that have all four properties.
Generator functions and yield
A generator function looks like a regular function, except it uses yield instead of (or in addition to) return. Calling it does not run the body — it returns a generator object you can iterate over.
def count_up_to(n):
i = 1
while i <= n:
yield i
i += 1
for value in count_up_to(5):
print(value)
# 1
# 2
# 3
# 4
# 5
What happens: when you call count_up_to(5), Python sets up the generator but does not run the function. The first call to next() runs the body until it hits the first yield, which produces a value and pauses the function — with its local variables and position preserved. The next call to next() resumes from exactly where it paused.
When the function finishes (falls off the end or executes a bare return), the generator raises StopIteration, and the for loop stops.
This pause-and-resume model is the whole story of generators. Once you see it, the rest is detail.
Generator expressions
A generator expression looks like a list comprehension with parentheses:
squares = (n * n for n in range(1, 6))
print(squares) # <generator object <genexpr> at 0x...>
print(list(squares)) # [1, 4, 9, 16, 25]
It is to list comprehensions what yield is to a function that appends to a list: same shape, lazy execution, one item at a time. See List Comprehensions for the connection.
Generator expressions are especially compact inside another function call:
total = sum(n * n for n in range(1, 1_000_001))
The extra parentheses around the generator expression are optional when it is the sole argument to a function. No million-element list is built — sum consumes the squares one at a time.
Memory benefits
The reason to reach for a generator is usually memory. Compare:
# Builds a 10-million-element list in memory
total = sum([n * n for n in range(10_000_000)])
# Uses constant memory
total = sum(n * n for n in range(10_000_000))
Both give the same answer. The first allocates roughly 280 MB to hold the squares; the second uses a handful of bytes. On large data this is the difference between a program that runs and one that swaps.
The same logic applies to file processing. Reading a multi-gigabyte log file as a list of lines is a non-starter. But a file object is already an iterator over lines:
def long_lines(path, threshold=100):
with open(path) as f:
for line in f:
if len(line) > threshold:
yield line.rstrip("\n")
for line in long_lines("server.log"):
print(line)
The whole pipeline — read, filter, print — touches one line at a time. Memory usage is independent of the file size. See Python File I/O for more on file iteration.
Infinite sequences
A generator does not have to end. As long as nothing forces it to stop, it can yield forever:
def naturals():
n = 1
while True:
yield n
n += 1
it = naturals()
print(next(it), next(it), next(it)) # 1 2 3
That looks alarming but is safe — you only get values when you ask for them. Combined with itertools.islice, infinite generators are a clean way to take “the first N” of something:
from itertools import islice
first_ten = list(islice(naturals(), 10))
print(first_ten) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
A more interesting infinite generator — Fibonacci numbers:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
print(list(islice(fibonacci(), 10)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
The generator carries its own state in local variables. You did not need a class or an external counter — yield and the implicit pause keep everything tidy.
Try it yourself. Write a generator triangles() that yields triangular numbers: 1, 3, 6, 10, 15, 21, … (the nth triangular number is n(n+1)/2). Use itertools.islice to grab the first 15 and confirm the 15th is 120.
Generators consume themselves
An iterator is one-pass. Once you’ve walked through a generator, it is exhausted:
gen = (n * n for n in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — already consumed
This is by design — generators don’t hold the values; they compute them as they go. If you need to iterate twice, either call the function again (it returns a fresh generator) or build a list once:
def squares_up_to(n):
for i in range(n):
yield i * i
g = squares_up_to(5)
once = list(g)
print(once, once) # [0, 1, 4, 9, 16] [0, 1, 4, 9, 16]
This is the chief difference from a list, and it trips up nearly everyone the first time. If you find yourself surprised that “my generator went empty,” ask whether something else already consumed it.
Pipelines of generators
Generators compose well — each stage takes an iterable in, yields out, and the next stage consumes those values. Memory stays constant no matter how long the pipeline:
def read_lines(path):
with open(path) as f:
for line in f:
yield line.rstrip("\n")
def non_blank(lines):
for line in lines:
if line.strip():
yield line
def parse_int(lines):
for line in lines:
try:
yield int(line)
except ValueError:
continue
# Each stage is lazy. Nothing happens until sum() pulls.
total = sum(parse_int(non_blank(read_lines("numbers.txt"))))
print(total)
Three generators stacked into a pipeline. The file is read line by line, blanks are dropped, valid integers are parsed, and sum pulls the result through. Memory usage is independent of the file size; CPU work is exactly what the problem demands.
This is the Unix-pipe model applied to Python. Once you start writing pipelines this way, you’ll see them everywhere.
Building your own iterator class
For most use cases yield is the right tool. But it’s worth seeing how the protocol looks from the inside — what yield is hiding from you.
class CountUp:
def __init__(self, end):
self.end = end
self.current = 1
def __iter__(self):
return self
def __next__(self):
if self.current > self.end:
raise StopIteration
value = self.current
self.current += 1
return value
for n in CountUp(3):
print(n)
# 1
# 2
# 3
Same behaviour as the generator version of count_up_to. The class version is more verbose, has explicit state, and can be customised in ways yield can’t (multiple iteration passes, additional methods). For 99% of cases, the generator is shorter and clearer — write the class when you genuinely need the extra control.
See Classes and Objects for the underlying class syntax.
Try it yourself. Write a generator chunks(iterable, size) that yields successive lists of length size taken from iterable. The last chunk may be shorter. Confirm list(chunks(range(10), 3)) is [[0,1,2],[3,4,5],[6,7,8],[9]].
When NOT to use a generator
Generators are not always the right answer. Use a list when:
- You need to iterate over the values more than once
- You need random access by index
- You need the length up front
- The data is small enough that lazy evaluation has no payoff and the extra indirection just gets in the way
The decision is usually: “is the data large or potentially infinite, and am I going to consume it in a single pass?” If yes, generator. If no, list.
A small bonus: itertools
The standard library’s itertools module is a treasure of generator utilities — chain, islice, takewhile, dropwhile, groupby, count, cycle, repeat, product, permutations, combinations. Whenever you find yourself reaching for a manual loop over an iterator, check itertools first.
from itertools import chain, count, takewhile
# Take numbers from an infinite stream until one passes 50
small = list(takewhile(lambda n: n < 50, (n * n for n in count(1))))
print(small) # [1, 4, 9, 16, 25, 36, 49]
We’ll do a dedicated post on itertools later. For now, know it exists and skim its page in the docs.
Recap
You now know:
- A
forloop callsiter(...)once and thennext(...)repeatedly untilStopIteration - A generator function uses
yieldto pause and resume, producing values one at a time - Generator expressions are list comprehensions with
()instead of[] - Generators use constant memory and let you describe infinite sequences cleanly
- Iterators are one-pass — once consumed, they’re empty
- Pipelines of generators are a powerful way to stream and transform data
- For most use cases
yieldbeats a hand-written iterator class
Next steps
Generators introduce the idea of a function that wraps and transforms behaviour. The next step is decorators — functions that wrap other functions to add logging, timing, caching, or access control. They look exotic at first but rest on the same foundation: functions are values that can be passed, stored, and returned.
→ Next: Python Decorators — The Beginner-Friendly Guide
Questions or feedback? Email codeloomdevv@gmail.com.