Django QuerySets and ORM Optimization

Intermediate 11 min read

What you'll learn

✓Why QuerySets are lazy and what triggers SQL
✓How to spot and kill N+1 queries
✓select_related vs prefetch_related decisions
✓Annotating and aggregating without round-tripping
✓Production patterns for read-heavy Django apps

Prerequisites

•Basic Python
•Some exposure to Django

What and why

A Django QuerySet is a lazy description of a query. It does not hit the database until you iterate it, slice it, count it, or otherwise materialize it. That laziness is the source of both the ORM’s power and its worst performance bugs.

Optimizing the Django ORM is mostly about controlling when queries fire and how many of them go out. Get that right and a single endpoint that ran fifty queries can run two.

Mental model

Think of every QuerySet as a SQL builder you carry around. Each .filter(), .order_by(), or .annotate() appends to that builder. Hitting the database happens at iteration. Methods like count(), exists(), first(), and list(qs) are all triggers.

qs = User.objects.filter(active=True)        # no SQL
qs = qs.exclude(banned=True)                  # no SQL
qs = qs.order_by("-created_at")               # no SQL
qs = qs.select_related("profile")             # no SQL
                                            |
                     evaluation triggers ---+
                            |
                            v
               +------------+------------+
               |            |            |
            for u in qs   qs.count()  list(qs)
               |            |            |
               v            v            v
            SELECT ...   SELECT COUNT  SELECT ...

Cache: once evaluated, subsequent iterations reuse the result.

QuerySet building vs evaluation

Hands-on example

The classic N+1 query bug:

# urls.py -> views.py
def list_orders(request):
    orders = Order.objects.all()[:20]
    rows = [(o.id, o.user.email, o.product.name) for o in orders]
    return JsonResponse({"rows": rows})

This emits one query for the orders and then two more per order (one for the user, one for the product). Forty-one queries for twenty rows.

Fix with select_related for forward foreign keys:

orders = Order.objects.select_related("user", "product")[:20]

select_related performs a SQL JOIN and pulls related rows in the same query. One query, twenty rows.

For reverse foreign keys and many-to-many, you need prefetch_related. It runs a second query and stitches the results in Python.

authors = Author.objects.prefetch_related("books")
for a in authors:
    for b in a.books.all():
        ...

Two queries total, regardless of how many authors and books exist.

For aggregates, push the work into the database. Annotating beats Python loops every time.

from django.db.models import Count, Sum

users = User.objects.annotate(
    order_count=Count("orders"),
    spent=Sum("orders__total"),
).filter(order_count__gt=0)

A single SQL statement does what would otherwise be a join, a group-by, and a sum in Python.

When you only need a few columns, .only(...) or .values(...) keeps the row set narrow.

emails = User.objects.filter(active=True).values_list("email", flat=True)

That returns a flat list of strings, no model instantiation, no extra fields.

Common pitfalls

len(qs) evaluates the entire query and counts in Python. Use qs.count() for a SQL COUNT.

qs.count() == 0 is two queries when one would do. Use not qs.exists().

Chaining select_related after prefetch_related resets state in subtle ways. Apply select_related first.

Iterating a large QuerySet loads every row into memory. Use iterator(chunk_size=2000) to stream from a server-side cursor.

for order in Order.objects.iterator(chunk_size=2000):
    process(order)

Updating in a loop fires one UPDATE per row. Use qs.update(...) for bulk SQL or bulk_update for differing values.

Template rendering hides queries. A {% for user in users %} block that accesses user.profile.avatar invisibly does an N+1 if profile was not prefetched. Always check the SQL log when an unfamiliar template is slow.

Production tips

Install django-debug-toolbar in development. The SQL panel shows the query count and per-query duration. If a request runs more than ten queries, look twice.

In production, log slow queries with LOGGING plus django.db.backends at DEBUG, or enable Postgres’s pg_stat_statements and tail the worst offenders weekly.

Cache aggressively, but cache the right thing. Caching whole HTML fragments hides ORM bugs; caching the QuerySet result via cache.set("key", list(qs), 60) is explicit and easy to reason about.

Use database constraints, not Python validation, for invariants that must hold. A UniqueConstraint is cheaper and stricter than a clean() method.

Watch for ORM features that disable indexes. __icontains and __iexact typically prevent index use; consider trigram indexes or full-text search for case-insensitive lookups at scale.

For multi-tenant apps, push tenant filters into a custom Manager so developers cannot forget them. The pattern eliminates an entire class of bugs.

class TenantQuerySet(models.QuerySet):
    def for_tenant(self, tenant):
        return self.filter(tenant=tenant)

class Order(models.Model):
    objects = TenantQuerySet.as_manager()

Wrap-up

QuerySets are lazy SQL builders that fire on evaluation. Avoid N+1 with select_related for foreign keys and prefetch_related for reverse relations and M2M. Annotate in SQL instead of looping in Python, use .values() and .only() to narrow columns, and stream with iterator() for large scans. Watch query counts in dev, log them in prod, and most Django performance complaints disappear.