SQL Aggregate Functions and GROUP BY in Depth
How aggregate functions interact with GROUP BY, HAVING, and window functions, with practical patterns and pitfalls every backend engineer should know.
What you'll learn
- ✓How GROUP BY collapses rows
- ✓SUM, COUNT, AVG, and friends
- ✓HAVING vs WHERE
- ✓GROUPING SETS and ROLLUP
- ✓Window functions as a smarter alternative
Prerequisites
- •Familiar with terminals and YAML
What and Why
GROUP BY collapses many rows into one per group and lets aggregate functions summarize them. It is how you turn a table of events into a dashboard of metrics. Anybody who writes reports, KPIs, billing rollups, or A/B test results lives in this corner of SQL.
The reason to study it carefully is that small mistakes - filtering at the wrong stage, mixing aggregated and non-aggregated columns, double-counting joined rows - produce wrong numbers that look right.
Mental Model
FROM -> tables and joins
WHERE -> filter rows BEFORE grouping
GROUP BY -> partition rows into groups
HAVING -> filter groups AFTER aggregation
SELECT -> pick columns and aggregates
ORDER BY -> sort the result
LIMIT -> trim Two practical rules from this order:
- Any non-aggregated column in
SELECTmust appear inGROUP BY. Otherwise the database does not know which value to pick. WHEREfilters rows;HAVINGfilters groups. UseWHEREwhen you can - it is cheaper.
Hands-on Example
A sales table:
CREATE TABLE sales (
id bigint PRIMARY KEY,
region text,
product text,
sold_at date,
quantity int,
unit_price numeric(10,2)
);
INSERT INTO sales VALUES
(1,'us','widget','2026-06-01',3, 9.99),
(2,'us','widget','2026-06-02',1, 9.99),
(3,'us','gizmo' ,'2026-06-02',2,19.99),
(4,'eu','widget','2026-06-03',5, 9.99),
(5,'eu','gizmo' ,'2026-06-04',1,19.99);
Total revenue by region:
SELECT region,
SUM(quantity * unit_price) AS revenue,
COUNT(*) AS orders,
AVG(quantity) AS avg_qty
FROM sales
GROUP BY region;
Only regions over a threshold:
SELECT region, SUM(quantity * unit_price) AS revenue
FROM sales
WHERE sold_at >= '2026-06-01'
GROUP BY region
HAVING SUM(quantity * unit_price) > 50;
Daily revenue with running total via a window function:
SELECT sold_at,
SUM(quantity * unit_price) AS daily,
SUM(SUM(quantity * unit_price)) OVER (ORDER BY sold_at) AS cumulative
FROM sales
GROUP BY sold_at
ORDER BY sold_at;
Multiple aggregation levels in one query with GROUPING SETS:
SELECT region, product,
SUM(quantity * unit_price) AS revenue
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());
-- Returns per (region,product), per region, and grand total in one shot
Distinct counting:
SELECT region, COUNT(DISTINCT product) AS distinct_products
FROM sales
GROUP BY region;
Common Pitfalls
- Selecting non-aggregated columns not in
GROUP BY. Postgres and standard SQL reject it; MySQL historically returned arbitrary values. Be explicit. COUNT(column)vsCOUNT(*).COUNT(column)skips NULLs. Reach forCOUNT(*)for “number of rows”.- AVG over NULLs.
AVGignores NULLs entirely. If you want to treat NULL as zero, useAVG(COALESCE(x, 0)). - HAVING without aggregation. Some engines allow it, but it usually means you meant
WHERE. Filtering rows is cheaper there. - Aggregating after a one-to-many join. Joining sales to a
productstable can multiply revenue if a product matches multiple rows. Aggregate first in a CTE, then join.
Production Tips
- Reach for window functions when you need both row-level detail and aggregates. They avoid the lossy collapse of
GROUP BY. - Use CTEs to stage aggregations. A clean pipeline of small named steps is easier to review and easier for the optimizer.
- Add partial or expression indexes for hot grouping keys. Indexing on
(region, sold_at)makes daily-by-region queries much faster. - For very large datasets, push aggregation into materialized views refreshed on a schedule. Dashboards stay snappy without thrashing the source tables.
- When working with floats and money, always store money as numeric/decimal, not
float. Aggregates over floats accumulate rounding error.
Wrap-up
GROUP BY and the aggregate family are the workhorses of analytical SQL. Remember the logical order - WHERE, then group, then HAVING - keep selected columns either aggregated or in the grouping clause, and reach for window functions and GROUPING SETS when one collapse is not enough. With those habits, your reports compute correctly the first time and remain correct as the data grows.
Related articles
- SQL SQL Materialized Views: A Practical Tutorial
Learn how materialized views work in PostgreSQL, when to use them over regular views, and how to refresh them efficiently without blocking readers.
- SQL SQL Window Functions: A Practical Tutorial
Learn how window functions work in SQL, when to use ROW_NUMBER, RANK, LAG, and SUM OVER, and how they differ from GROUP BY.
- SQL CTE vs Subquery vs Temp Table: Picking the Right SQL Tool
Compare common table expressions, subqueries, and temporary tables in SQL. Learn when each shines, the performance trade-offs, and concrete examples.
- SQL SQL EXPLAIN ANALYZE Deep Dive
How to read EXPLAIN ANALYZE output, interpret cost estimates, spot bad plans, and use it to drive real performance improvements in production databases.