Skip to content
C Codeloom
Python

Python Generators and the yield Keyword

Understand Python generators from the ground up — how yield turns a function into an iterator, lazy evaluation, generator expressions, and when generators beat lists.

·5 min read · By Codeloom
Beginner 10 min read

What you'll learn

  • What a generator is and how yield works
  • How generators differ from regular functions
  • Generator expressions vs list comprehensions
  • Streaming large data lazily
  • Composing generators into pipelines

Prerequisites

  • Basic Python familiarity

Generators are one of the cleanest ideas in Python. They let you produce a sequence of values one at a time, on demand, without holding the whole sequence in memory. The mechanism is a single keyword — yield — but the implications run deep, especially when you’re working with large data or streaming sources.

The Problem Generators Solve

Imagine you want to read a 10 GB log file and count error lines. The naive approach loads everything into memory.

def load_lines(path):
    with open(path) as f:
        return f.readlines()  # returns a list of every line

This will exhaust your RAM. A generator gives you the same interface — iterate line by line — without ever holding more than one line at a time.

yield Turns a Function Into an Iterator

When Python sees yield inside a function, the function no longer runs to completion when called. Instead, calling it returns a generator object. Each time you advance the generator (with next() or a for loop), the function runs until the next yield, hands back a value, and pauses.

def counter():
    yield 1
    yield 2
    yield 3

gen = counter()
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3

Once the function returns (or runs off the end), the generator raises StopIteration, which is what for loops use to know they are done.

A Practical Streaming Example

Here is a generator that yields lines from a file lazily.

def lines_of(path):
    with open(path) as f:
        for line in f:
            yield line.rstrip("\n")

errors = sum(1 for line in lines_of("server.log") if "ERROR" in line)
print(errors)

Memory usage stays flat regardless of file size. This pattern — generator producer, comprehension consumer — is everywhere in idiomatic Python.

Generator Expressions

A generator expression looks like a list comprehension but with parentheses instead of square brackets. It produces a generator object instead of a list.

squares = (x * x for x in range(1_000_000))
total = sum(squares)

If you wrote [x * x for x in range(1_000_000)] instead, Python would materialise a million-element list. The generator version computes one square at a time and lets sum consume them.

A nice shortcut: when a generator expression is the only argument to a function, you can drop the parentheses.

total = sum(x * x for x in range(1_000_000))

Composing Generators

Generators chain naturally. Each one transforms a stream of values without buffering.

def numbers():
    n = 0
    while True:
        yield n
        n += 1

def evens(stream):
    for n in stream:
        if n % 2 == 0:
            yield n

def squared(stream):
    for n in stream:
        yield n * n

pipeline = squared(evens(numbers()))
for _ in range(5):
    print(next(pipeline))

You have just built an infinite pipeline that costs almost nothing. This is the same model that powers Unix pipes — small, composable transformers connected end to end.

yield from

When one generator delegates to another, yield from is cleaner than a loop.

def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item

print(list(flatten([1, [2, [3, 4]], 5])))  # [1, 2, 3, 4, 5]

yield from sub is roughly equivalent to for x in sub: yield x, but shorter and slightly faster.

State Between Yields

Local variables in a generator persist across yield points. That makes generators a natural fit for stateful iteration.

def running_average():
    total = 0
    count = 0
    for value in iter(int, None):
        # never reached — placeholder
        pass

def running_average(values):
    total = 0
    count = 0
    for v in values:
        total += v
        count += 1
        yield total / count

for avg in running_average([10, 20, 30, 40]):
    print(avg)

Each call to next() resumes inside the loop, updates the locals, and pauses again at yield.

When to Use a Generator

Reach for a generator when:

  • The data is large or unbounded.
  • You only need to iterate once.
  • You want to compose transformations cheaply.
  • You are writing producer/consumer code where the consumer controls the pace.

Reach for a list when:

  • You need random access by index.
  • You need to iterate the same data multiple times.
  • The dataset is small and you want a concrete snapshot.

Don’t Confuse Generators with Lists

A common bug: trying to iterate a generator twice.

gen = (x * x for x in range(5))
print(list(gen))  # [0, 1, 4, 9, 16]
print(list(gen))  # [] — generator is exhausted

If you need to iterate twice, convert to a list, or build the generator from a function so you can call it again.

Wrapping Up

Generators are how Python handles streams of data without losing its straightforward syntax. Master yield, generator expressions, and yield from, and you can write code that processes terabytes of data with the same shape as the toy examples in this article.