SQL Aggregations: COUNT, SUM, AVG, GROUP BY, HAVING
Summarise data the right way — COUNT, SUM, AVG, MIN, MAX, GROUP BY across one or many columns, and the WHERE vs HAVING distinction that catches everyone the first time.
What you'll learn
- ✓The core aggregate functions: COUNT, SUM, AVG, MIN, MAX
- ✓How GROUP BY rolls rows up into groups
- ✓When to use WHERE vs HAVING
- ✓Grouping by more than one column
- ✓How NULL interacts with aggregates
- ✓Useful helpers: ROUND, CAST, COALESCE
Prerequisites
- •Comfort with SELECT and WHERE — see SQL SELECT Basics
- •Familiarity with JOINs — see SQL JOINs
GROUP BY is where SQL stops feeling like a list and starts feeling like a report. Once you can summarise data — count it, sum it, average it — you can answer almost every “how many” or “how much” question a business asks. This post covers the aggregate functions, GROUP BY, and the WHERE-vs-HAVING distinction every analyst eventually internalises.
The running dataset
We’ll keep using the users, orders, products schema. Quick reminder:
-- users(id, name, country, age)
-- products(id, name, price)
-- orders(id, user_id, product_id, quantity, total, status)
Seven orders across five users, plus Edsger from the JOINs post who has none.
Aggregate functions
An aggregate function takes many rows and returns one value. The five you use most:
| Function | Returns |
|---|---|
COUNT(*) | Number of rows in the group |
COUNT(col) | Number of non-NULL values in col |
SUM(col) | Sum of non-NULL values |
AVG(col) | Average of non-NULL values |
MIN(col), MAX(col) | Smallest / largest value |
Used without GROUP BY, they collapse the whole table into a single row.
SELECT COUNT(*) AS total_orders,
SUM(total) AS revenue,
AVG(total) AS avg_order,
MIN(total) AS smallest,
MAX(total) AS largest
FROM orders;
Result:
total_orders | revenue | avg_order | smallest | largest
-------------+---------+-----------+----------+--------
7 | 289.50 | 41.36 | 6.00 | 98.00
One row, five summary numbers. Notice the column aliases — AS total_orders, AS revenue — without them the result columns are unhelpfully named count, sum, etc.
COUNT(*) vs COUNT(col)
This is a subtle but important distinction.
COUNT(*)counts every row, including rows where every column isNULL.COUNT(col)counts rows wherecolis notNULL.
SELECT
COUNT(*) AS total_users,
COUNT(country) AS users_with_country,
COUNT(age) AS users_with_age
FROM users;
If a user has NULL country, COUNT(country) skips them. This is exactly how you measure data completeness.
COUNT(DISTINCT col) counts unique values:
SELECT COUNT(DISTINCT country) AS distinct_countries FROM users;
-- 4 (UK, FI, US, NL)
GROUP BY
GROUP BY splits the rows into buckets and applies the aggregate to each bucket. The simplest case — orders per user:
SELECT user_id, COUNT(*) AS order_count, SUM(total) AS spent
FROM orders
GROUP BY user_id
ORDER BY user_id;
Result:
user_id | order_count | spent
--------+-------------+------
1 | 2 | 35.00
2 | 1 | 49.00
3 | 1 | 89.00
4 | 2 | 18.50
5 | 1 | 98.00
Every column in the SELECT list must either appear in GROUP BY or be inside an aggregate. The reason is mathematical: if a column isn’t part of the grouping key and isn’t aggregated, the database wouldn’t know which of the many possible values to return for the group.
-- ERROR: name is neither grouped nor aggregated
SELECT user_id, name, COUNT(*)
FROM users u JOIN orders o ON o.user_id = u.id
GROUP BY user_id;
The fix is to add name to GROUP BY (safe when grouping by a primary key like u.id):
SELECT u.id, u.name, COUNT(*) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY order_count DESC;
Grouping by multiple columns
Add columns to GROUP BY to slice along multiple dimensions. “Orders per country per status”:
SELECT u.country, o.status, COUNT(*) AS orders, SUM(o.total) AS revenue
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.country, o.status
ORDER BY u.country, o.status;
Result:
country | status | orders | revenue
--------+----------+--------+--------
FI | refunded | 1 | 49.00
UK | paid | 3 | 133.00
US | paid | 2 | 95.00
US | pending | 1 | 12.50
Read this as a small report — every (country, status) combination that has at least one row is one line of output. Combinations with zero orders simply don’t appear.
HAVING
WHERE filters rows before grouping. HAVING filters groups after grouping. You can’t put an aggregate in WHERE; you can in HAVING.
-- Users who have spent more than $50
SELECT u.name, SUM(o.total) AS spent
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
HAVING SUM(o.total) > 50
ORDER BY spent DESC;
Compare to a query that combines both:
-- Paid orders only, grouped by user, where the resulting total > 50
SELECT u.name, SUM(o.total) AS spent
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'
GROUP BY u.id, u.name
HAVING SUM(o.total) > 50;
The flow is:
FROM/JOINproduces the working set.WHEREdrops rows you don’t care about.GROUP BYbuckets the rest.HAVINGdrops whole groups.SELECTprojects the final columns.ORDER BYsorts.LIMITchops the result.
Hold that pipeline in your head and the order-of-clauses rule (SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT) stops being arbitrary.
Try it yourself. Using the seed data from CREATE TABLE and INSERT:
- Average
totalperstatus. - Number of distinct users per country.
- Products that appear in more than one order. (Hint:
GROUP BY product_id HAVING COUNT(*) > 1.) - Countries whose average paid-order total exceeds $40.
If you can write all four without peeking, you’ve absorbed the pattern.
NULL and aggregates
SUM, AVG, MIN, and MAX all ignore NULL values in their input.
-- ages: 36, 55, 41, 29, 62, NULL (a hypothetical user)
SELECT AVG(age) FROM users;
-- Averages over 5 non-null ages, not 6
COUNT(*) counts the row even when every column is NULL. COUNT(col) skips NULLs. This asymmetry is a constant source of bugs — when in doubt, write a small test query and confirm the count you expect.
SUM of zero matching rows returns NULL, not 0. Use COALESCE to substitute:
SELECT u.name, COALESCE(SUM(o.total), 0) AS spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY u.id;
Edsger now shows spent = 0 instead of NULL.
ROUND, CAST, and friendly numbers
Aggregate results often need a little polish before they go in a report.
ROUND
SELECT u.country, ROUND(AVG(o.total), 2) AS avg_order
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.country;
ROUND(value, decimals) rounds to a given number of decimal places. Most dialects also support CEIL, FLOOR, and TRUNC.
CAST
CAST converts between types. Useful when you divide two integers and want a decimal result:
-- Integer division: returns 0 if cnt < total
SELECT SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) / COUNT(*) AS paid_ratio
FROM orders;
-- Float division: returns something like 0.71
SELECT CAST(SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS REAL) / COUNT(*) AS paid_ratio
FROM orders;
A common shorthand multiplies by 1.0:
SELECT 1.0 * SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) / COUNT(*) AS paid_ratio
FROM orders;
Filtered aggregates
Modern Postgres and SQLite support FILTER:
SELECT
COUNT(*) FILTER (WHERE status = 'paid') AS paid,
COUNT(*) FILTER (WHERE status = 'refunded') AS refunded
FROM orders;
A clean way to compute several conditional counts in one pass. MySQL uses SUM(CASE WHEN … THEN 1 ELSE 0 END) for the same effect.
A worked example
A monthly-style summary across the whole dataset:
SELECT
u.country,
COUNT(DISTINCT u.id) AS users,
COUNT(o.id) AS total_orders,
COUNT(o.id) FILTER (WHERE o.status = 'paid') AS paid_orders,
COUNT(o.id) FILTER (WHERE o.status = 'refunded') AS refunded_orders,
COALESCE(SUM(o.total) FILTER (WHERE o.status = 'paid'), 0) AS paid_revenue,
ROUND(COALESCE(AVG(o.total) FILTER (WHERE o.status = 'paid'), 0), 2) AS avg_paid_total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.country
ORDER BY paid_revenue DESC;
Every country shows up because of the LEFT JOIN (Edsger’s NL country still appears even though he has no orders). Aggregates use FILTER to slice by status. COALESCE keeps the NULLs out of the final report.
This is a real shape of query you’ll write a thousand times. Learn the parts and you can crank them out from memory.
Try it yourself. Write a query that returns, for each product, total_sold (sum of quantity) and revenue (sum of total) — but only for paid orders. Sort by revenue descending. Then add a HAVING clause to include only products with more than 2 paid units sold.
Common beginner mistakes
- Putting a non-aggregated, non-grouped column in
SELECTand being surprised by an error or by MySQL’s permissive (and incorrect) result. - Filtering on an aggregate inside
WHEREinstead ofHAVING. - Forgetting that
COUNT(col)skipsNULLandCOUNT(*)doesn’t. - Comparing
SUMof zero rows to0instead of usingCOALESCE. - Integer division returning
0because both operands are integers —CASTone toREALor multiply by1.0.
Recap
You now know:
COUNT,SUM,AVG,MIN,MAXsummarise rowsGROUP BYbuckets rows; every selected column must be grouped or aggregatedWHEREfilters rows before grouping;HAVINGfilters groups after- Aggregates ignore
NULLs —COALESCEmakes the result reportable ROUND,CAST, andFILTERpolish the output
This is the toolkit for every “how many” and “how much” question you’ll ever be asked.
Next steps
Now that you can summarise, the next leap is composing queries: subqueries, derived tables, and Common Table Expressions (CTEs). These let you treat the result of one query as the input to another, which is how you write the complicated reports without losing your mind.
Next: Subqueries and CTEs in SQL
Related reading: SQL JOINs, SQL SELECT Basics.
Questions or feedback? Email codeloomdevv@gmail.com.