Database Normalization: 1NF, 2NF, 3NF Made Practical

Intermediate 10 min read

What you'll learn

✓What each normal form actually means without textbook jargon
✓How to spot 1NF, 2NF, and 3NF violations in real schemas
✓Refactoring a denormalized orders table step by step
✓When denormalization is the right answer
✓How normalization interacts with joins and indexes

Prerequisites

•Comfort reading SQL: [SQL SELECT basics](/blog/sql-select-basics)
•Familiar with multi-table queries: [SQL joins](/blog/sql-joins)

Normalization is the process of organizing a database so each fact is stored once and in the right place. The classic normal forms (1NF, 2NF, 3NF) are not abstract theory; they are a checklist that catches the most common schema mistakes before they show up as inconsistent data three months into production. We will go through them with a single running example: an orders system.

A messy starting point

Imagine a small shop stores everything about an order in one wide row.

CREATE TABLE orders_flat (
  order_id        INT PRIMARY KEY,
  order_date      DATE,
  customer_name   TEXT,
  customer_email  TEXT,
  customer_phone  TEXT,
  shipping_city   TEXT,
  items           TEXT,         -- "SKU-1:2, SKU-2:1"
  item_prices     TEXT,         -- "9.99, 19.99"
  total           NUMERIC
);

Everything you might want to display about an order is in one place. It is also a maintenance disaster waiting to happen. We will fix it form by form.

First normal form: atomic columns and no repeating groups

A table is in 1NF when each column holds a single, atomic value and there are no repeating groups. The items and item_prices columns in orders_flat violate this: they pack multiple items into a single text value.

Why it matters in practice: you cannot index a SKU inside that string, you cannot query “how many of SKU-1 did we sell”, and you cannot add a quantity-per-item column without rewriting parsers.

The fix is to split the repeating group into its own table.

CREATE TABLE orders (
  order_id   INT PRIMARY KEY,
  order_date DATE,
  customer_name  TEXT,
  customer_email TEXT,
  customer_phone TEXT,
  shipping_city  TEXT,
  total          NUMERIC
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(order_id),
  sku      TEXT,
  quantity INT,
  price    NUMERIC,
  PRIMARY KEY (order_id, sku)
);

Now each item is one row. You can GROUP BY sku, you can index sku, and adding a column like discount does not require touching application parsing code.

Second normal form: no partial dependencies on a composite key

A table is in 2NF if it is in 1NF and every non-key column depends on the whole primary key, not just part of it. This only matters when the primary key is composite.

Look at order_items. Its primary key is (order_id, sku). The quantity and price columns depend on the full key, which is correct. But suppose we extend the table to include the product name.

CREATE TABLE order_items_bad (
  order_id     INT,
  sku          TEXT,
  product_name TEXT,
  quantity     INT,
  price        NUMERIC,
  PRIMARY KEY (order_id, sku)
);

product_name depends only on sku, not on the combination of order_id and sku. The same product name will be repeated in every order that includes that SKU. If the name changes, you have to update every row.

The fix is to extract a products table.

CREATE TABLE products (
  sku          TEXT PRIMARY KEY,
  product_name TEXT,
  description  TEXT
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(order_id),
  sku      TEXT REFERENCES products(sku),
  quantity INT,
  price    NUMERIC,
  PRIMARY KEY (order_id, sku)
);

Now product information lives in one place. Updating a product name is one row, not many.

Third normal form: no transitive dependencies

A table is in 3NF if it is in 2NF and no non-key column depends on another non-key column. In other words, every non-key column should depend directly on the primary key, not on something else that depends on the key.

The orders table from earlier violates this. The customer fields all depend on the customer, not on the order. If a customer updates their phone number, we would have to update every order they ever placed.

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name        TEXT,
  email       TEXT UNIQUE,
  phone       TEXT
);

CREATE TABLE orders (
  order_id      INT PRIMARY KEY,
  order_date    DATE,
  customer_id   INT REFERENCES customers(customer_id),
  shipping_city TEXT,
  total         NUMERIC
);

Customer details now belong to the customer. The order keeps a foreign key. A phone number change is one UPDATE, not thousands.

A subtle case: should shipping_city go into orders or into a separate addresses table? It depends. If the city is the customer’s current city and could change retroactively, normalize it out. If it is the city the order shipped to and must never change even when the customer moves, leave it on the order. Normalization is about logical dependencies, not about pushing every field into its own table.

The final shape

After three small refactors we have a clean schema.

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name        TEXT,
  email       TEXT UNIQUE,
  phone       TEXT
);

CREATE TABLE products (
  sku          TEXT PRIMARY KEY,
  product_name TEXT,
  description  TEXT
);

CREATE TABLE orders (
  order_id      INT PRIMARY KEY,
  order_date    DATE,
  customer_id   INT REFERENCES customers(customer_id),
  shipping_city TEXT,
  total         NUMERIC
);

CREATE TABLE order_items (
  order_id INT REFERENCES orders(order_id),
  sku      TEXT REFERENCES products(sku),
  quantity INT,
  price    NUMERIC,
  PRIMARY KEY (order_id, sku)
);

Each fact lives in one place. Updates touch one row. The schema describes the real-world relationships honestly.

Querying a normalized schema

Reads now require joins. That is a fair trade for write safety and storage efficiency.

SELECT
  o.order_id,
  o.order_date,
  c.name AS customer,
  SUM(oi.quantity * oi.price) AS computed_total
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY o.order_id, o.order_date, c.name
ORDER BY o.order_date DESC;

This is the kind of query SQL joins is meant to make natural. As traffic grows, index the join columns and the columns you filter on; the patterns in SQL indexes and performance keep these queries fast.

When to denormalize

Normalization is a starting point, not a religion. There are good reasons to flatten data again.

Read-heavy analytics. Aggregating across millions of rows is faster on a wide, denormalized table because the database avoids joins and reads less.
Snapshot data. Invoices and receipts must record the prices, names, and addresses as they were at the time of sale. Embedding those values in the order is correct.
Hot paths. If a single high-traffic query keeps joining the same three tables, materializing a denormalized view of the result can drop latency at the cost of refresh logic.

Denormalize deliberately, with a written reason. The default should still be normalized, because the cost of recovering from inconsistent data is much higher than the cost of writing a join.

Wrap up

The normal forms are checklists, not rituals. 1NF says no repeating groups in a single column. 2NF says no half-dependencies on a composite key. 3NF says no fact about a non-key column hiding in another row’s data. Apply them to your schema, then push back where the real world says a snapshot or a read-optimized view earns its keep. The combination of a normalized core with a few intentional denormalizations is what production databases actually look like, and now you know how to build one.