Skip to content
C Codeloom
SQL

SQL Window Functions: A Practical Tutorial

Learn how window functions work in SQL, when to use ROW_NUMBER, RANK, LAG, and SUM OVER, and how they differ from GROUP BY.

·5 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • How window functions differ from GROUP BY
  • How OVER, PARTITION BY, and ORDER BY interact
  • When to use ROW_NUMBER, RANK, and DENSE_RANK
  • How LAG and LEAD let you compare rows
  • How to write running totals and moving averages

Prerequisites

  • Basic SELECT, JOIN, and GROUP BY in any SQL database

What and Why

GROUP BY collapses many rows into one. Window functions do the opposite: they compute an aggregate or rank for each row while keeping every row visible. That single property unlocks running totals, moving averages, ranking within groups, period-over-period comparisons, and de-duplication patterns that are awkward or impossible with plain aggregation.

If you have ever written a subquery just to attach a MAX or COUNT back to the original row, a window function will replace it with one cleaner clause.

Mental Model

A window function still scans rows, but instead of grouping them it builds a “window” of related rows for each row and computes a value over that window. The OVER clause defines the window.

GROUP BY: many rows -> one row per group
 [r1 r2 r3]  -> [agg1]
 [r4 r5]     -> [agg2]

Window:   many rows -> many rows, agg attached
 r1 -> agg(window_of_r1)
 r2 -> agg(window_of_r2)
 r3 -> agg(window_of_r3)
 r4 -> agg(window_of_r4)
GROUP BY vs window

The OVER clause has three optional parts:

  • PARTITION BY splits rows into independent groups (like GROUP BY, but per-row).
  • ORDER BY orders rows inside each partition; required by ranking and offset functions.
  • A frame clause (ROWS BETWEEN ... or RANGE BETWEEN ...) limits which rows in the partition contribute to the aggregate.

Hands-on Example

Take a small sales table:

CREATE TABLE sales (
  id        SERIAL PRIMARY KEY,
  region    TEXT,
  sold_at   DATE,
  amount    NUMERIC
);

INSERT INTO sales (region, sold_at, amount) VALUES
  ('east', '2026-01-05', 100),
  ('east', '2026-01-06', 150),
  ('east', '2026-01-09', 80),
  ('west', '2026-01-05', 200),
  ('west', '2026-01-07', 120),
  ('west', '2026-01-12', 90);

Attach the region total to every row without collapsing:

SELECT region, sold_at, amount,
       SUM(amount) OVER (PARTITION BY region) AS region_total
FROM sales
ORDER BY region, sold_at;

Each row keeps its detail and gains the regional total. Now a running total within each region:

SELECT region, sold_at, amount,
       SUM(amount) OVER (
         PARTITION BY region
         ORDER BY sold_at
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_total
FROM sales;

Rank each sale within its region by amount:

SELECT region, sold_at, amount,
       ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn,
       RANK()       OVER (PARTITION BY region ORDER BY amount DESC) AS rk,
       DENSE_RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS drk
FROM sales;

ROW_NUMBER is always unique. RANK leaves gaps when ties occur (1, 2, 2, 4). DENSE_RANK does not (1, 2, 2, 3). Pick based on whether ties should reserve slots.

Period-over-period comparisons with LAG:

SELECT region, sold_at, amount,
       LAG(amount) OVER (PARTITION BY region ORDER BY sold_at) AS prev_amount,
       amount - LAG(amount) OVER (PARTITION BY region ORDER BY sold_at) AS delta
FROM sales;

LEAD is the symmetric peek-ahead. A 3-row moving average:

SELECT region, sold_at, amount,
       AVG(amount) OVER (
         PARTITION BY region
         ORDER BY sold_at
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) AS ma3
FROM sales;

A common de-duplication pattern: keep the latest row per key:

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY region ORDER BY sold_at DESC) AS rn
  FROM sales
)
SELECT region, sold_at, amount
FROM ranked
WHERE rn = 1;

Common Pitfalls

Filtering on a window result in WHERE. The window is computed after WHERE, so WHERE ROW_NUMBER() OVER (...) = 1 is a syntax error. Wrap it in a CTE or subquery and filter outside, as above.

Forgetting ORDER BY inside OVER for LAG, LEAD, ROW_NUMBER, or a running total. Without it, the database is free to return rows in any order and your result becomes nondeterministic.

Using RANGE when you wanted ROWS. With RANGE, ties on the ORDER BY column collapse into the same frame, so a running total over rows with duplicate dates will add them all at once.

Assuming PARTITION BY is free. It builds an implicit sort per partition; on huge tables an appropriate index ((region, sold_at)) makes the difference between snappy and unusable.

Mixing window functions with GROUP BY carelessly. Window functions see the post-aggregation rows. Use a subquery if you want to window before grouping.

Practical Tips

Define named windows when you reuse the same OVER clause:

SELECT region, sold_at, amount,
       SUM(amount) OVER w AS running,
       AVG(amount) OVER w AS avg_so_far
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY sold_at);

For top-N-per-group, ROW_NUMBER plus a CTE is faster and clearer than IN with a correlated subquery.

FIRST_VALUE and LAST_VALUE return values from the edges of the frame. With LAST_VALUE, remember the default frame ends at the current row; you usually want ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

PERCENT_RANK, CUME_DIST, and NTILE are great for percentile and bucket analytics without leaving SQL.

Postgres, SQL Server, Oracle, modern MySQL (8+), SQLite (3.25+), and BigQuery all support the syntax shown here. Reading the docs for your exact engine pays off for frame edge cases.

Wrap-up

Window functions add one expressive idea to SQL: compute an aggregate over a related set of rows without collapsing them. With PARTITION BY, ORDER BY, and a frame you can write running totals, rankings, period comparisons, and de-duplication queries that are clearer and faster than the subquery acrobatics they replace.