Docker Images, Layers, and Volumes Explained
A practical deep dive into how Docker images are stored as layers, how the build cache works, and the real difference between bind mounts and named volumes.
What you'll learn
- ✓How Docker images are stored as a stack of read-only layers
- ✓How the build cache decides what to rebuild and what to reuse
- ✓The difference between tags and image IDs
- ✓Bind mounts vs named volumes vs tmpfs — when to use each
- ✓How to clean up unused images, containers, and volumes safely
Prerequisites
- •You have built an image with a Dockerfile — see Dockerfile Basics
- •You have used Docker Compose — see Compose Basics
By now you can run containers, build images, and orchestrate small stacks with Compose. This final post explains the mechanics underneath — how images are stored, why the build cache behaves the way it does, and how data actually lives in or alongside a container. These are the ideas that turn Docker from “a tool I follow tutorials with” into something you can reason about.
Images are stacks of layers
When docker build processes a Dockerfile, each instruction produces a layer — a small, content-addressed snapshot of the filesystem changes that instruction caused. The final image is the ordered stack of those layers.
Take a simple example:
FROM node:20-alpine # layer 1 (the base, itself many layers)
WORKDIR /app # layer 2 — sets metadata, no filesystem change
COPY package*.json ./ # layer 3 — adds two files
RUN npm install # layer 4 — adds node_modules/
COPY . . # layer 5 — adds the rest of the source
CMD ["node", "server.js"] # layer 6 — metadata only
Each layer is identified by a SHA-256 hash of its contents. Two layers with identical inputs produce identical hashes — and Docker stores each unique hash only once on disk. If ten of your images all start FROM node:20-alpine, the Node base layers exist on disk exactly once.
You can see the layers of any image with:
docker history docker-demo:1.0
The output lists each layer from top to bottom, with its instruction and size. Layers without filesystem changes show 0B.
How the build cache works
Docker’s build cache is the reason an incremental docker build finishes in seconds instead of minutes. The rule is simple:
For each instruction, Docker checks whether the inputs to that instruction have changed since the last build. If they have not, it reuses the cached layer. If they have, it rebuilds that layer and every layer after it.
What counts as “inputs” depends on the instruction:
- For
RUN, the literal command string. Changingapt install curltoapt install curl jqinvalidates the cache. - For
COPYandADD, the contents of the files being copied. Ifpackage.jsonis unchanged byte-for-byte, the layer is reused. - For
FROM, the resolved base image. Pulling a newer tag invalidates downstream layers.
This is exactly why the ordering trick from the Dockerfile post matters:
COPY package*.json ./ # layer rebuilds only when manifests change
RUN npm install # depends on the layer above
COPY . . # rebuilds whenever source changes
Source code changes constantly; package.json does not. By copying manifests first, the expensive npm install layer stays in the cache through most edits. Inverting the order would make every code change trigger a full reinstall — sometimes a 30-second penalty per edit.
Try it yourself. Take the docker-demo project from the Dockerfile post. Run docker build -t demo:a ., then again without changes — the output should say CACHED for every line. Now edit a comment in server.js and rebuild. Notice that only the layers from COPY . . onward rebuild; npm install is reused. Finally, edit package.json (bump the version) and rebuild — now npm install rebuilds too.
Tags vs image IDs
Every image has a permanent image ID — a SHA-256 hash like sha256:9f3c2a.... A tag like docker-demo:1.0 is just a mutable pointer to one of those IDs. You can move tags around at will:
docker tag docker-demo:1.0 docker-demo:latest
docker tag docker-demo:1.0 ghcr.io/me/docker-demo:1.0
This does not duplicate the image data. It only adds new pointers to the same content.
The implications are worth absorbing:
latestis not a special version — it is just the default tag.docker pull nginxis equivalent todocker pull nginx:latest. The image behindlatestchanges over time.- Pulling
:latestfor a base image in production is usually a bad idea. Pin specific versions (node:20.11-alpine) so your builds are reproducible. - For irrefutable identity, you can pull by digest:
docker pull nginx@sha256:.... The image referenced is guaranteed to be exactly that content.
Where data goes
A container has its own filesystem, but anything written to it is destroyed when the container is removed. That is a feature, not a bug — containers are meant to be replaceable. But real applications need to keep some data. Docker offers three mechanisms.
Named volumes
docker volume create app-data
docker run -d -v app-data:/var/lib/postgresql/data postgres:16
A named volume is a storage area managed by Docker, living somewhere inside Docker’s own data directory (you usually do not care where). It survives container removal, can be mounted into many containers, and is backed up and restored by Docker tooling.
This is the right choice for databases and other stateful services. The Compose example from the previous post used a named volume for Postgres exactly this way.
List and inspect volumes:
docker volume ls
docker volume inspect app-data
docker volume rm app-data # only if no container uses it
Bind mounts
docker run -d -v $(pwd)/src:/app/src node:20-alpine
A bind mount maps a specific path on the host into the container. Changes in either place are visible in the other immediately. This is the standard pattern for development workflows — edit code on the host, see it picked up inside the container.
Bind mounts are powerful but tightly coupled to the host’s filesystem layout. Avoid them for production data: paths differ between machines, permissions can be surprising, and Docker has no way to back them up. Reserve them for code-syncing during development.
The modern syntax uses --mount, which is more explicit and recommended over -v:
docker run -d \
--mount type=bind,source="$(pwd)"/src,target=/app/src \
node:20-alpine
Both forms still work; --mount is harder to misread.
tmpfs mounts
docker run -d --tmpfs /tmp:size=64m nginx
A tmpfs mount is an in-memory filesystem that lives only as long as the container does. It never touches disk. Use it for caches and scratch space where speed matters and persistence does not.
A quick comparison
| Mechanism | Stored where | Survives docker rm? | Best for |
|---|---|---|---|
| Named volume | Managed by Docker | Yes | Database storage, anything stateful |
| Bind mount | Host path you choose | Yes (it is on the host) | Local dev — live code sync |
| tmpfs | RAM | No | Caches, secrets that must not hit disk |
Try it yourself. Run docker run -d --name pg1 -v pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=secret postgres:16. Connect with docker exec -it pg1 psql -U postgres -c "CREATE TABLE t (id int); INSERT INTO t VALUES (1);". Now docker stop pg1 && docker rm pg1. Start a fresh container with the same volume: docker run -d --name pg2 -v pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=secret postgres:16. Connect and run SELECT * FROM t; — your data is still there because the volume survived.
Cleaning up
Containers, images, and volumes accumulate quietly. After a few weeks of experimentation, you might have many gigabytes of stopped containers and unused images on disk.
The safe everyday cleanup commands:
# Remove stopped containers
docker container prune
# Remove images not referenced by any container
docker image prune
# Remove dangling images only (untagged intermediate layers)
docker image prune --filter dangling=true
# Remove unused networks
docker network prune
# Show disk usage by category
docker system df
A more aggressive command removes everything not currently in use:
docker system prune
And the nuclear option, which also removes named volumes:
docker system prune --volumes
Be cautious with --volumes. If you have a Postgres database sitting in a volume that no running container references right now, this command will delete your data. Read the confirmation prompt before pressing y.
A few useful inspection commands
These pay off the more you use Docker.
# Show every detail of a container as JSON
docker inspect <container>
# Show resource usage of running containers, live
docker stats
# Show the diff between a container's filesystem and its image
docker diff <container>
# Find which image a tag currently points to
docker image inspect nginx:latest --format '{{.Id}}'
docker diff is particularly illuminating. It lists every file added, changed, or deleted in the container relative to its image. Run it against an old container of your own and you can see exactly which writes your app made.
Production-shaped habits
A few habits that will save pain later:
- Pin everything. Base images by tag (
node:20.11-alpine), application images by version (docker-demo:1.4.2), Composeimage:lines explicit. - Treat containers as cattle. Anything you would not want to lose should be in a named volume or external storage, not in the container’s writable layer.
- Keep images small. Smaller images push faster, pull faster, and have a smaller attack surface. Alpine and
-slimbase images,.dockerignore, and combining relatedRUNsteps all help. - Use multi-stage builds for compiled languages. Build artifacts in one stage, copy them into a slim runtime stage. The final image contains only what the app needs to run, not the compiler toolchain.
- Prune regularly. A weekly
docker system dffollowed by targeted prunes prevents your disk from quietly filling up.
Recap
You now know:
- Images are stacks of content-addressed layers that Docker stores once and shares across images
- The build cache reuses a layer when its instruction and inputs are unchanged — and busts everything below as soon as one input changes
- A tag is a mutable pointer to an immutable image ID;
latestis just the default tag, not a special version - Named volumes are for persistent data; bind mounts are for local development; tmpfs is for in-memory scratch space
- Regular
docker system dfand selectiveprunecommands keep disk usage under control - Pinning versions, keeping images small, and using multi-stage builds are the habits that make Docker pleasant in the long run
Next steps
This post closes the beginner Docker series. With these five posts you can now: explain containers to a teammate, install Docker cleanly, write a real Dockerfile, orchestrate a multi-service app with Compose, and reason about how images and storage actually work.
From here, two directions are worth exploring. The first is deeper Dockerfile craft — multi-stage builds, BuildKit features, and slimming images down to a few megabytes. The second is container orchestration — moving from single-host Compose to Kubernetes or a managed platform like Fly.io or Cloud Run. Pick whichever is closer to the work you want to ship.
Questions or feedback? Email codeloomdevv@gmail.com.