Skip to content
C Codeloom
SQL

SQL Aggregate Functions and GROUP BY in Depth

How aggregate functions interact with GROUP BY, HAVING, and window functions, with practical patterns and pitfalls every backend engineer should know.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • How GROUP BY collapses rows
  • SUM, COUNT, AVG, and friends
  • HAVING vs WHERE
  • GROUPING SETS and ROLLUP
  • Window functions as a smarter alternative

Prerequisites

  • Familiar with terminals and YAML

What and Why

GROUP BY collapses many rows into one per group and lets aggregate functions summarize them. It is how you turn a table of events into a dashboard of metrics. Anybody who writes reports, KPIs, billing rollups, or A/B test results lives in this corner of SQL.

The reason to study it carefully is that small mistakes - filtering at the wrong stage, mixing aggregated and non-aggregated columns, double-counting joined rows - produce wrong numbers that look right.

Mental Model

FROM      -> tables and joins
WHERE     -> filter rows BEFORE grouping
GROUP BY  -> partition rows into groups
HAVING    -> filter groups AFTER aggregation
SELECT    -> pick columns and aggregates
ORDER BY  -> sort the result
LIMIT     -> trim
Logical query order

Two practical rules from this order:

  1. Any non-aggregated column in SELECT must appear in GROUP BY. Otherwise the database does not know which value to pick.
  2. WHERE filters rows; HAVING filters groups. Use WHERE when you can - it is cheaper.

Hands-on Example

A sales table:

CREATE TABLE sales (
  id          bigint PRIMARY KEY,
  region      text,
  product     text,
  sold_at     date,
  quantity    int,
  unit_price  numeric(10,2)
);

INSERT INTO sales VALUES
  (1,'us','widget','2026-06-01',3, 9.99),
  (2,'us','widget','2026-06-02',1, 9.99),
  (3,'us','gizmo' ,'2026-06-02',2,19.99),
  (4,'eu','widget','2026-06-03',5, 9.99),
  (5,'eu','gizmo' ,'2026-06-04',1,19.99);

Total revenue by region:

SELECT region,
       SUM(quantity * unit_price) AS revenue,
       COUNT(*) AS orders,
       AVG(quantity) AS avg_qty
FROM sales
GROUP BY region;

Only regions over a threshold:

SELECT region, SUM(quantity * unit_price) AS revenue
FROM sales
WHERE sold_at >= '2026-06-01'
GROUP BY region
HAVING SUM(quantity * unit_price) > 50;

Daily revenue with running total via a window function:

SELECT sold_at,
       SUM(quantity * unit_price) AS daily,
       SUM(SUM(quantity * unit_price)) OVER (ORDER BY sold_at) AS cumulative
FROM sales
GROUP BY sold_at
ORDER BY sold_at;

Multiple aggregation levels in one query with GROUPING SETS:

SELECT region, product,
       SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());
-- Returns per (region,product), per region, and grand total in one shot

Distinct counting:

SELECT region, COUNT(DISTINCT product) AS distinct_products
FROM sales
GROUP BY region;

Common Pitfalls

  • Selecting non-aggregated columns not in GROUP BY. Postgres and standard SQL reject it; MySQL historically returned arbitrary values. Be explicit.
  • COUNT(column) vs COUNT(*). COUNT(column) skips NULLs. Reach for COUNT(*) for “number of rows”.
  • AVG over NULLs. AVG ignores NULLs entirely. If you want to treat NULL as zero, use AVG(COALESCE(x, 0)).
  • HAVING without aggregation. Some engines allow it, but it usually means you meant WHERE. Filtering rows is cheaper there.
  • Aggregating after a one-to-many join. Joining sales to a products table can multiply revenue if a product matches multiple rows. Aggregate first in a CTE, then join.

Production Tips

  • Reach for window functions when you need both row-level detail and aggregates. They avoid the lossy collapse of GROUP BY.
  • Use CTEs to stage aggregations. A clean pipeline of small named steps is easier to review and easier for the optimizer.
  • Add partial or expression indexes for hot grouping keys. Indexing on (region, sold_at) makes daily-by-region queries much faster.
  • For very large datasets, push aggregation into materialized views refreshed on a schedule. Dashboards stay snappy without thrashing the source tables.
  • When working with floats and money, always store money as numeric/decimal, not float. Aggregates over floats accumulate rounding error.

Wrap-up

GROUP BY and the aggregate family are the workhorses of analytical SQL. Remember the logical order - WHERE, then group, then HAVING - keep selected columns either aggregated or in the grouping clause, and reach for window functions and GROUPING SETS when one collapse is not enough. With those habits, your reports compute correctly the first time and remain correct as the data grows.