AI Recommendation Systems Overview

Beginner 10 min read

What you'll learn

✓What recommendation systems actually do
✓How collaborative and content-based methods differ
✓The two-stage candidate-then-rank pattern
✓Common trade-offs around freshness and diversity
✓Practical tips for shipping a first version

Prerequisites

•Basic ML familiarity

Recommendation systems quietly decide a huge fraction of what people read, watch, listen to, and buy online. They are also one of the oldest and most studied applications of machine learning. This post walks through the main ideas, the common architecture, and the trade-offs that show up once you put one in production.

What and Why

A recommendation system predicts what a user is likely to want next from a large catalog of items. The signal can be explicit, like a star rating, or implicit, like a click or a watch-time fraction. The goal is usually to maximize some downstream outcome: engagement, retention, revenue, or a healthier mix of long-term metrics.

Why bother? Catalogs grow faster than users can browse. A music app has tens of millions of tracks. A marketplace has hundreds of millions of listings. Without ranking, the user sees noise. A good recommender turns the catalog into a personal stream.

Mental Model

Most production systems use a two-stage funnel. The first stage is candidate generation: pull a few hundred plausible items from the whole catalog using cheap methods. The second stage is ranking: score those candidates with a heavier model that uses richer features.

Collaborative filtering learns from co-occurrence. If users who liked A also liked B, then A and B are related. Content-based filtering uses item features directly, comparing embeddings of text, images, or audio. Hybrid systems combine both and add user and context features like time of day or device.

Hands-on Example

Imagine a video platform. A user finishes a cooking video. The candidate generator queries an approximate nearest neighbor index over video embeddings and pulls the top 500 similar videos. It also pulls 200 from a collaborative filtering model and 100 trending videos. After deduplication you have around 700 candidates.

The ranker is a gradient boosted tree or a neural network that takes user features, item features, and interaction features and predicts the probability of a watch-through. The top twenty are returned to the client, with a diversity rule that prevents three cooking videos in a row.

user request
   |
   v
[candidate generation]
- ANN over embeddings
- collaborative filtering
- trending pool
   |
   v
 ~700 candidates
   |
   v
[ranking model]
features: user, item, context
   |
   v
top 20 + diversity rules
   |
   v
 client

Two-stage recommendation funnel

A simple toy version in Python looks like this:

import numpy as np

def cosine(a, b):
    return a @ b / (np.linalg.norm(a) * np.linalg.norm(b))

def recommend(user_vec, item_vecs, k=10):
    scores = np.array([cosine(user_vec, v) for v in item_vecs])
    return scores.argsort()[-k:][::-1]

That is the heart of content-based recommendation. Production code adds batching, caching, and feature stores, but the idea is the same.

Trade-offs

Collaborative filtering needs interaction data, so it struggles with new users and new items. This is the cold start problem. Content-based methods handle cold start better but can feel narrow, recommending more of what the user already saw.

Optimizing for short-term clicks is easy and dangerous. It tends to push clickbait and reduce long-term satisfaction. Most mature systems blend several objectives and add explicit diversity, novelty, or fairness constraints.

Latency budgets are tight. The full pipeline often has to run in under one hundred milliseconds. That pushes heavy work offline into nightly batch jobs and forces the online ranker to be small.

Practical Tips

Start with a strong non-personalized baseline like popularity within segment. Many sophisticated systems barely beat it, so it tells you whether personalization is worth the complexity.

Log the candidates, the scores, and the final ranking. Without this you cannot debug bad recommendations or run counterfactual evaluations. Treat your logging schema as a first-class artifact.

Evaluate offline with care. Offline metrics like recall at K are useful but only loosely correlated with online outcomes. Plan for A/B tests early and pick guardrail metrics that catch regressions in retention.

Watch for feedback loops. The model recommends what users click, users click what the model recommends, and over time the system narrows. Inject exploration through random items or epsilon-greedy slots.

Wrap-up

Recommendation systems are a deceptively simple idea built on top of layered, careful engineering. The two-stage funnel, the mix of collaborative and content signals, and the constant tension between short-term and long-term objectives are the patterns to internalize. Start small, log everything, and let A/B tests drive the roadmap. The shiny model matters less than the discipline around it.