Skip to content
C Codeloom
System Design

System Design: Search Autocomplete System

Design a fast search autocomplete that returns suggestions in tens of milliseconds, with personalization, ranking, and freshness updates.

·3 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • Why autocomplete is hard at scale
  • Tries vs inverted indexes
  • Ranking by popularity and recency
  • Personalization layers
  • Keeping suggestions fresh

Prerequisites

  • Familiar with HTTP and databases

What and Why

Autocomplete is the first thing a user sees when they tap a search box, and it shapes what they search for. It must respond in under 100 ms so suggestions feel like part of typing.

Beyond speed, suggestions need to be relevant. A grocery app should not suggest the same query a news site would. Ranking is half the design.

Mental Model

Two reads dominate: “what queries start with this prefix” and “how popular is each candidate.” The first is a structural lookup; the second is a numeric ranking. Most designs separate them.

A trie or a precomputed prefix map answers the first. A score per query, refreshed on a schedule, answers the second.

Architecture

A precomputed index maps each prefix (up to N characters) to its top K candidate queries already ranked. Lookups become a single hash get. A streaming pipeline updates scores from search logs.

Client -> Edge API -> Prefix Cache (Redis)
                        |
                    hit -> top K queries
                        |
                    miss -> Index Service -> rebuild prefix
                                                  |
                                              Background job
                                                  |
                                         updates from logs
Autocomplete request path

The score combines popularity, recency, and personalization. A simple model: score = log(count) * recency_decay + user_affinity. Compute the global piece offline; layer personalization at query time.

Tries are elegant but memory-heavy at scale. A precomputed top-K map per prefix trades memory for simpler reads. For very long prefixes you fall back to a search index like Elasticsearch.

Trade-offs

Latency budgets are tight. You usually cannot afford a database hit, so the entire hot path lives in memory. That means the index must fit in RAM across the fleet.

Personalization at scale is expensive. Most products apply only a thin per-user re-rank on top of global suggestions, instead of building a personalized index per user.

Spelling correction adds value but doubles complexity. Many teams ship pure prefix matching first and add fuzzy matching later.

Practical Tips

Cap the prefix length at the index. Prefixes beyond 6-8 characters are rare and can fall back to a generic search call.

Throttle on the client: only send a request when the user pauses typing for 50 ms. This cuts request volume by 5-10x without hurting feel.

Cache aggressively at the edge. Many users type the same first letters; an edge cache with a one-minute TTL is enough to absorb most traffic.

Watch for offensive or low-quality suggestions. Maintain a blocklist and a manual override path. This is product work, not infrastructure, but it gates launch.

Wrap-up

Autocomplete is a small surface area with deep engineering. Precompute prefixes, keep them in memory, and rank with a simple but tuned score. Add personalization and freshness only where they pay off. The result is a feature users barely notice, which is exactly what good autocomplete should be.