SQL Aggregations: COUNT, SUM, AVG, GROUP BY, HAVING

Intermediate 12 min read

What you'll learn

✓The core aggregate functions: COUNT, SUM, AVG, MIN, MAX
✓How GROUP BY rolls rows up into groups
✓When to use WHERE vs HAVING
✓Grouping by more than one column
✓How NULL interacts with aggregates
✓Useful helpers: ROUND, CAST, COALESCE

Prerequisites

•Comfort with SELECT and WHERE — see SQL SELECT Basics
•Familiarity with JOINs — see SQL JOINs

GROUP BY is where SQL stops feeling like a list and starts feeling like a report. Once you can summarise data — count it, sum it, average it — you can answer almost every “how many” or “how much” question a business asks. This post covers the aggregate functions, GROUP BY, and the WHERE-vs-HAVING distinction every analyst eventually internalises.

The running dataset

We’ll keep using the users, orders, products schema. Quick reminder:

-- users(id, name, country, age)
-- products(id, name, price)
-- orders(id, user_id, product_id, quantity, total, status)

Seven orders across five users, plus Edsger from the JOINs post who has none.

Aggregate functions

An aggregate function takes many rows and returns one value. The five you use most:

Function	Returns
`COUNT(*)`	Number of rows in the group
`COUNT(col)`	Number of non-NULL values in `col`
`SUM(col)`	Sum of non-NULL values
`AVG(col)`	Average of non-NULL values
`MIN(col)`, `MAX(col)`	Smallest / largest value

Used without GROUP BY, they collapse the whole table into a single row.

SELECT COUNT(*) AS total_orders,
       SUM(total) AS revenue,
       AVG(total) AS avg_order,
       MIN(total) AS smallest,
       MAX(total) AS largest
FROM orders;

Result:

total_orders | revenue | avg_order | smallest | largest
-------------+---------+-----------+----------+--------
           7 |  289.50 |     41.36 |     6.00 |   98.00

One row, five summary numbers. Notice the column aliases — AS total_orders, AS revenue — without them the result columns are unhelpfully named count, sum, etc.

COUNT(*) vs COUNT(col)

This is a subtle but important distinction.

COUNT(*) counts every row, including rows where every column is NULL.
COUNT(col) counts rows where col is not NULL.

SELECT
  COUNT(*)       AS total_users,
  COUNT(country) AS users_with_country,
  COUNT(age)     AS users_with_age
FROM users;

If a user has NULL country, COUNT(country) skips them. This is exactly how you measure data completeness.

COUNT(DISTINCT col) counts unique values:

SELECT COUNT(DISTINCT country) AS distinct_countries FROM users;
-- 4 (UK, FI, US, NL)

GROUP BY

GROUP BY splits the rows into buckets and applies the aggregate to each bucket. The simplest case — orders per user:

SELECT user_id, COUNT(*) AS order_count, SUM(total) AS spent
FROM orders
GROUP BY user_id
ORDER BY user_id;

Result:

user_id | order_count | spent
--------+-------------+------
      1 |           2 | 35.00
      2 |           1 | 49.00
      3 |           1 | 89.00
      4 |           2 | 18.50
      5 |           1 | 98.00

Every column in the SELECT list must either appear in GROUP BY or be inside an aggregate. The reason is mathematical: if a column isn’t part of the grouping key and isn’t aggregated, the database wouldn’t know which of the many possible values to return for the group.

-- ERROR: name is neither grouped nor aggregated
SELECT user_id, name, COUNT(*)
FROM users u JOIN orders o ON o.user_id = u.id
GROUP BY user_id;

The fix is to add name to GROUP BY (safe when grouping by a primary key like u.id):

SELECT u.id, u.name, COUNT(*) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY order_count DESC;

Grouping by multiple columns

Add columns to GROUP BY to slice along multiple dimensions. “Orders per country per status”:

SELECT u.country, o.status, COUNT(*) AS orders, SUM(o.total) AS revenue
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.country, o.status
ORDER BY u.country, o.status;

Result:

country | status   | orders | revenue
--------+----------+--------+--------
FI      | refunded |      1 |   49.00
UK      | paid     |      3 |  133.00
US      | paid     |      2 |   95.00
US      | pending  |      1 |   12.50

Read this as a small report — every (country, status) combination that has at least one row is one line of output. Combinations with zero orders simply don’t appear.

HAVING

WHERE filters rows before grouping. HAVING filters groups after grouping. You can’t put an aggregate in WHERE; you can in HAVING.

-- Users who have spent more than $50
SELECT u.name, SUM(o.total) AS spent
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
HAVING SUM(o.total) > 50
ORDER BY spent DESC;

Compare to a query that combines both:

-- Paid orders only, grouped by user, where the resulting total > 50
SELECT u.name, SUM(o.total) AS spent
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'
GROUP BY u.id, u.name
HAVING SUM(o.total) > 50;

The flow is:

FROM/JOIN produces the working set.
WHERE drops rows you don’t care about.
GROUP BY buckets the rest.
HAVING drops whole groups.
SELECT projects the final columns.
ORDER BY sorts.
LIMIT chops the result.

Hold that pipeline in your head and the order-of-clauses rule (SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT) stops being arbitrary.

Try it yourself. Using the seed data from CREATE TABLE and INSERT:

Average total per status.
Number of distinct users per country.
Products that appear in more than one order. (Hint: GROUP BY product_id HAVING COUNT(*) > 1.)
Countries whose average paid-order total exceeds $40.

If you can write all four without peeking, you’ve absorbed the pattern.

NULL and aggregates

SUM, AVG, MIN, and MAX all ignore NULL values in their input.

-- ages: 36, 55, 41, 29, 62, NULL (a hypothetical user)
SELECT AVG(age) FROM users;
-- Averages over 5 non-null ages, not 6

COUNT(*) counts the row even when every column is NULL. COUNT(col) skips NULLs. This asymmetry is a constant source of bugs — when in doubt, write a small test query and confirm the count you expect.

SUM of zero matching rows returns NULL, not 0. Use COALESCE to substitute:

SELECT u.name, COALESCE(SUM(o.total), 0) AS spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY u.id;

Edsger now shows spent = 0 instead of NULL.

ROUND, CAST, and friendly numbers

Aggregate results often need a little polish before they go in a report.

ROUND

SELECT u.country, ROUND(AVG(o.total), 2) AS avg_order
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.country;

ROUND(value, decimals) rounds to a given number of decimal places. Most dialects also support CEIL, FLOOR, and TRUNC.

CAST

CAST converts between types. Useful when you divide two integers and want a decimal result:

-- Integer division: returns 0 if cnt < total
SELECT SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) / COUNT(*) AS paid_ratio
FROM orders;

-- Float division: returns something like 0.71
SELECT CAST(SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS REAL) / COUNT(*) AS paid_ratio
FROM orders;

A common shorthand multiplies by 1.0:

SELECT 1.0 * SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) / COUNT(*) AS paid_ratio
FROM orders;

Filtered aggregates

Modern Postgres and SQLite support FILTER:

SELECT
  COUNT(*) FILTER (WHERE status = 'paid') AS paid,
  COUNT(*) FILTER (WHERE status = 'refunded') AS refunded
FROM orders;

A clean way to compute several conditional counts in one pass. MySQL uses SUM(CASE WHEN … THEN 1 ELSE 0 END) for the same effect.

A worked example

A monthly-style summary across the whole dataset:

SELECT
  u.country,
  COUNT(DISTINCT u.id) AS users,
  COUNT(o.id)          AS total_orders,
  COUNT(o.id) FILTER (WHERE o.status = 'paid')     AS paid_orders,
  COUNT(o.id) FILTER (WHERE o.status = 'refunded') AS refunded_orders,
  COALESCE(SUM(o.total) FILTER (WHERE o.status = 'paid'), 0) AS paid_revenue,
  ROUND(COALESCE(AVG(o.total) FILTER (WHERE o.status = 'paid'), 0), 2) AS avg_paid_total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.country
ORDER BY paid_revenue DESC;

Every country shows up because of the LEFT JOIN (Edsger’s NL country still appears even though he has no orders). Aggregates use FILTER to slice by status. COALESCE keeps the NULLs out of the final report.

This is a real shape of query you’ll write a thousand times. Learn the parts and you can crank them out from memory.

Try it yourself. Write a query that returns, for each product, total_sold (sum of quantity) and revenue (sum of total) — but only for paid orders. Sort by revenue descending. Then add a HAVING clause to include only products with more than 2 paid units sold.

Common beginner mistakes

Putting a non-aggregated, non-grouped column in SELECT and being surprised by an error or by MySQL’s permissive (and incorrect) result.
Filtering on an aggregate inside WHERE instead of HAVING.
Forgetting that COUNT(col) skips NULL and COUNT(*) doesn’t.
Comparing SUM of zero rows to 0 instead of using COALESCE.
Integer division returning 0 because both operands are integers — CAST one to REAL or multiply by 1.0.

Recap

You now know:

COUNT, SUM, AVG, MIN, MAX summarise rows
GROUP BY buckets rows; every selected column must be grouped or aggregated
WHERE filters rows before grouping; HAVING filters groups after
Aggregates ignore NULLs — COALESCE makes the result reportable
ROUND, CAST, and FILTER polish the output

This is the toolkit for every “how many” and “how much” question you’ll ever be asked.

Next steps

Now that you can summarise, the next leap is composing queries: subqueries, derived tables, and Common Table Expressions (CTEs). These let you treat the result of one query as the input to another, which is how you write the complicated reports without losing your mind.

Next: Subqueries and CTEs in SQL

Related reading: SQL JOINs, SQL SELECT Basics.

Questions or feedback? Email codeloomdevv@gmail.com.