Skip to content
C Codeloom
Docker

Docker Images, Layers, and Volumes Explained

A practical deep dive into how Docker images are stored as layers, how the build cache works, and the real difference between bind mounts and named volumes.

·10 min read · By Yash Kesharwani
Beginner 12 min read

What you'll learn

  • How Docker images are stored as a stack of read-only layers
  • How the build cache decides what to rebuild and what to reuse
  • The difference between tags and image IDs
  • Bind mounts vs named volumes vs tmpfs — when to use each
  • How to clean up unused images, containers, and volumes safely

Prerequisites

By now you can run containers, build images, and orchestrate small stacks with Compose. This final post explains the mechanics underneath — how images are stored, why the build cache behaves the way it does, and how data actually lives in or alongside a container. These are the ideas that turn Docker from “a tool I follow tutorials with” into something you can reason about.

Images are stacks of layers

When docker build processes a Dockerfile, each instruction produces a layer — a small, content-addressed snapshot of the filesystem changes that instruction caused. The final image is the ordered stack of those layers.

Take a simple example:

FROM node:20-alpine        # layer 1 (the base, itself many layers)
WORKDIR /app               # layer 2 — sets metadata, no filesystem change
COPY package*.json ./      # layer 3 — adds two files
RUN npm install            # layer 4 — adds node_modules/
COPY . .                   # layer 5 — adds the rest of the source
CMD ["node", "server.js"]  # layer 6 — metadata only

Each layer is identified by a SHA-256 hash of its contents. Two layers with identical inputs produce identical hashes — and Docker stores each unique hash only once on disk. If ten of your images all start FROM node:20-alpine, the Node base layers exist on disk exactly once.

You can see the layers of any image with:

docker history docker-demo:1.0

The output lists each layer from top to bottom, with its instruction and size. Layers without filesystem changes show 0B.

How the build cache works

Docker’s build cache is the reason an incremental docker build finishes in seconds instead of minutes. The rule is simple:

For each instruction, Docker checks whether the inputs to that instruction have changed since the last build. If they have not, it reuses the cached layer. If they have, it rebuilds that layer and every layer after it.

What counts as “inputs” depends on the instruction:

  • For RUN, the literal command string. Changing apt install curl to apt install curl jq invalidates the cache.
  • For COPY and ADD, the contents of the files being copied. If package.json is unchanged byte-for-byte, the layer is reused.
  • For FROM, the resolved base image. Pulling a newer tag invalidates downstream layers.

This is exactly why the ordering trick from the Dockerfile post matters:

COPY package*.json ./        # layer rebuilds only when manifests change
RUN npm install              # depends on the layer above
COPY . .                     # rebuilds whenever source changes

Source code changes constantly; package.json does not. By copying manifests first, the expensive npm install layer stays in the cache through most edits. Inverting the order would make every code change trigger a full reinstall — sometimes a 30-second penalty per edit.

Try it yourself. Take the docker-demo project from the Dockerfile post. Run docker build -t demo:a ., then again without changes — the output should say CACHED for every line. Now edit a comment in server.js and rebuild. Notice that only the layers from COPY . . onward rebuild; npm install is reused. Finally, edit package.json (bump the version) and rebuild — now npm install rebuilds too.

Tags vs image IDs

Every image has a permanent image ID — a SHA-256 hash like sha256:9f3c2a.... A tag like docker-demo:1.0 is just a mutable pointer to one of those IDs. You can move tags around at will:

docker tag docker-demo:1.0 docker-demo:latest
docker tag docker-demo:1.0 ghcr.io/me/docker-demo:1.0

This does not duplicate the image data. It only adds new pointers to the same content.

The implications are worth absorbing:

  • latest is not a special version — it is just the default tag. docker pull nginx is equivalent to docker pull nginx:latest. The image behind latest changes over time.
  • Pulling :latest for a base image in production is usually a bad idea. Pin specific versions (node:20.11-alpine) so your builds are reproducible.
  • For irrefutable identity, you can pull by digest: docker pull nginx@sha256:.... The image referenced is guaranteed to be exactly that content.

Where data goes

A container has its own filesystem, but anything written to it is destroyed when the container is removed. That is a feature, not a bug — containers are meant to be replaceable. But real applications need to keep some data. Docker offers three mechanisms.

Named volumes

docker volume create app-data
docker run -d -v app-data:/var/lib/postgresql/data postgres:16

A named volume is a storage area managed by Docker, living somewhere inside Docker’s own data directory (you usually do not care where). It survives container removal, can be mounted into many containers, and is backed up and restored by Docker tooling.

This is the right choice for databases and other stateful services. The Compose example from the previous post used a named volume for Postgres exactly this way.

List and inspect volumes:

docker volume ls
docker volume inspect app-data
docker volume rm app-data        # only if no container uses it

Bind mounts

docker run -d -v $(pwd)/src:/app/src node:20-alpine

A bind mount maps a specific path on the host into the container. Changes in either place are visible in the other immediately. This is the standard pattern for development workflows — edit code on the host, see it picked up inside the container.

Bind mounts are powerful but tightly coupled to the host’s filesystem layout. Avoid them for production data: paths differ between machines, permissions can be surprising, and Docker has no way to back them up. Reserve them for code-syncing during development.

The modern syntax uses --mount, which is more explicit and recommended over -v:

docker run -d \
  --mount type=bind,source="$(pwd)"/src,target=/app/src \
  node:20-alpine

Both forms still work; --mount is harder to misread.

tmpfs mounts

docker run -d --tmpfs /tmp:size=64m nginx

A tmpfs mount is an in-memory filesystem that lives only as long as the container does. It never touches disk. Use it for caches and scratch space where speed matters and persistence does not.

A quick comparison

MechanismStored whereSurvives docker rm?Best for
Named volumeManaged by DockerYesDatabase storage, anything stateful
Bind mountHost path you chooseYes (it is on the host)Local dev — live code sync
tmpfsRAMNoCaches, secrets that must not hit disk

Try it yourself. Run docker run -d --name pg1 -v pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=secret postgres:16. Connect with docker exec -it pg1 psql -U postgres -c "CREATE TABLE t (id int); INSERT INTO t VALUES (1);". Now docker stop pg1 && docker rm pg1. Start a fresh container with the same volume: docker run -d --name pg2 -v pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=secret postgres:16. Connect and run SELECT * FROM t; — your data is still there because the volume survived.

Cleaning up

Containers, images, and volumes accumulate quietly. After a few weeks of experimentation, you might have many gigabytes of stopped containers and unused images on disk.

The safe everyday cleanup commands:

# Remove stopped containers
docker container prune

# Remove images not referenced by any container
docker image prune

# Remove dangling images only (untagged intermediate layers)
docker image prune --filter dangling=true

# Remove unused networks
docker network prune

# Show disk usage by category
docker system df

A more aggressive command removes everything not currently in use:

docker system prune

And the nuclear option, which also removes named volumes:

docker system prune --volumes

Be cautious with --volumes. If you have a Postgres database sitting in a volume that no running container references right now, this command will delete your data. Read the confirmation prompt before pressing y.

A few useful inspection commands

These pay off the more you use Docker.

# Show every detail of a container as JSON
docker inspect <container>

# Show resource usage of running containers, live
docker stats

# Show the diff between a container's filesystem and its image
docker diff <container>

# Find which image a tag currently points to
docker image inspect nginx:latest --format '{{.Id}}'

docker diff is particularly illuminating. It lists every file added, changed, or deleted in the container relative to its image. Run it against an old container of your own and you can see exactly which writes your app made.

Production-shaped habits

A few habits that will save pain later:

  • Pin everything. Base images by tag (node:20.11-alpine), application images by version (docker-demo:1.4.2), Compose image: lines explicit.
  • Treat containers as cattle. Anything you would not want to lose should be in a named volume or external storage, not in the container’s writable layer.
  • Keep images small. Smaller images push faster, pull faster, and have a smaller attack surface. Alpine and -slim base images, .dockerignore, and combining related RUN steps all help.
  • Use multi-stage builds for compiled languages. Build artifacts in one stage, copy them into a slim runtime stage. The final image contains only what the app needs to run, not the compiler toolchain.
  • Prune regularly. A weekly docker system df followed by targeted prunes prevents your disk from quietly filling up.

Recap

You now know:

  • Images are stacks of content-addressed layers that Docker stores once and shares across images
  • The build cache reuses a layer when its instruction and inputs are unchanged — and busts everything below as soon as one input changes
  • A tag is a mutable pointer to an immutable image ID; latest is just the default tag, not a special version
  • Named volumes are for persistent data; bind mounts are for local development; tmpfs is for in-memory scratch space
  • Regular docker system df and selective prune commands keep disk usage under control
  • Pinning versions, keeping images small, and using multi-stage builds are the habits that make Docker pleasant in the long run

Next steps

This post closes the beginner Docker series. With these five posts you can now: explain containers to a teammate, install Docker cleanly, write a real Dockerfile, orchestrate a multi-service app with Compose, and reason about how images and storage actually work.

From here, two directions are worth exploring. The first is deeper Dockerfile craft — multi-stage builds, BuildKit features, and slimming images down to a few megabytes. The second is container orchestration — moving from single-host Compose to Kubernetes or a managed platform like Fly.io or Cloud Run. Pick whichever is closer to the work you want to ship.

Questions or feedback? Email codeloomdevv@gmail.com.