GraphQL N+1 and DataLoader

Intermediate 10 min read

What you'll learn

✓Why GraphQL resolvers naturally cause N+1
✓How DataLoader batches calls within a tick
✓How per-request caching avoids duplicate work
✓How to wire DataLoader into a typical schema
✓Pitfalls with auth and cache scope

Prerequisites

•Basic GraphQL schemas and resolvers

The first time a GraphQL server hits production, the database lights up. A query for 50 posts triggers 50 user lookups plus 50 comment counts. That is N+1, and it is built into how naive resolvers work. DataLoader is the standard fix. This post is about why N+1 happens and how to make it go away without rewriting your schema.

Why N+1 happens

In GraphQL, each field is resolved independently. A list field returns N items, then each item’s nested field is resolved once per item. Each of those resolvers is its own function and does its own database call. No magic batching happens.

{
  posts(limit: 50) {
    title
    author { name }
  }
}

Naively: one query for posts, then 50 queries for authors. That is 51 queries for 50 posts.

Mental model

Without DataLoader (N+1):
posts query --> [p1..p50]
   |
   +-- author(p1) -> SELECT user WHERE id=1
   +-- author(p2) -> SELECT user WHERE id=2
   ...
   +-- author(p50) -> SELECT user WHERE id=50

With DataLoader (batched):
posts query --> [p1..p50]
   |
   +-- author(p1..p50) collected in one tick
                   -> SELECT user WHERE id IN (1..50)

N+1 vs batched loads

DataLoader collects all the keys you ask for during one event-loop tick and dispatches a single batch function with all of them at once.

Hands-on: a typical DataLoader

const DataLoader = require('dataloader');

function makeUserLoader(db) {
  return new DataLoader(async (ids) => {
    const rows = await db.users.findMany({ where: { id: { in: ids } } });
    const byId = new Map(rows.map(r => [r.id, r]));
    return ids.map(id => byId.get(id) ?? null);
  });
}

The contract is strict: the batch function takes keys, returns an array of the same length, in the same order. Missing keys become null or an Error. Skip the ordering and you have silent data corruption.

Wiring it into a schema

Loaders belong on the per-request context. Create them fresh for each request so the cache does not leak data between users.

const { ApolloServer } = require('@apollo/server');

const server = new ApolloServer({
  typeDefs,
  resolvers: {
    Post: {
      author: (post, _, ctx) => ctx.loaders.user.load(post.authorId),
    },
  },
});

await startStandaloneServer(server, {
  context: async () => ({
    loaders: { user: makeUserLoader(db) },
  }),
});

Now the 50-post query fires one batched user lookup. From 51 queries to 2.

Beyond simple by-id loaders

DataLoader works for anything keyable: counts, joins, even per-key paginated lists.

const commentCountLoader = new DataLoader(async (postIds) => {
  const rows = await db.$queryRaw`
    SELECT post_id, COUNT(*)::int AS n
    FROM comments
    WHERE post_id = ANY(${postIds})
    GROUP BY post_id`;
  const byId = new Map(rows.map(r => [r.post_id, r.n]));
  return postIds.map(id => byId.get(id) ?? 0);
});

For loaders that take composite keys (e.g., “comments for post X with status Y”), key by a stable string and parse it inside the batch function.

The cache is per-request

DataLoader caches load(key) results within its lifetime. Because the loader is created per request, the cache is per-request too. That is good: it avoids stale or cross-tenant data. It also means you do not get cache hits across requests; for that, use a Redis or in-process cache below the DataLoader.

Common pitfalls

Returning a different array length from the batch function. DataLoader maps results by index; getting this wrong silently misaligns data.
Sharing a loader across requests. The cache is now global and you will leak data across users.
Loading the same key with different “shapes” (e.g., user with vs without email). Use separate loaders per query shape.
Calling await between collecting keys. DataLoader batches only what is queued within one tick; awaiting in the middle breaks the batch.
Throwing inside the batch function for one bad key. Return an Error for that index instead, so the others still resolve.
Ignoring authorization. DataLoader does not know about access checks. Authorize the result before returning to the resolver.

Practical tips

Make a loaders factory keyed by user, so context construction is one line.
For lists per parent (“comments of post X”), key by postId and have the batch return arrays. Document the contract clearly.
Use Prisma’s findMany({ where: { id: { in } } }) or raw SQL with ANY($1) for the batched lookup; both are fast.
Add metrics: batch size distribution tells you whether DataLoader is helping or whether resolvers are forced into sequential awaits.
Pair DataLoader with query complexity limits. Batching helps databases; it does not stop a client from asking for a million nodes.

Wrap-up

GraphQL’s per-field resolver model is what makes the API expressive, and it is also what makes N+1 inevitable. DataLoader solves the problem with two ideas: collect within a tick, return in order. Wire one loader per data source, scope to the request, and most of your “GraphQL is slow” complaints will go away.