Linux cgroups Explained: How Containers Get Their Limits
A practical introduction to Linux control groups. Learn what cgroups do, how v1 and v2 differ, and how Docker and Kubernetes use them to cap CPU and memory.
What you'll learn
- ✓What cgroups are and what they control
- ✓The difference between v1 and v2
- ✓How memory and CPU limits actually work
- ✓How Docker and Kubernetes use cgroups
- ✓How to inspect cgroups on a live system
Prerequisites
- •Basic Linux command line
If namespaces give containers their separate view of the world, cgroups are what enforce their share of it. Every CPU limit you set on a Docker container, every memory cap in a Kubernetes pod, every “noisy neighbor” protection on a multi-tenant host comes down to cgroups doing their job in the kernel.
What and Why
Control groups, usually called cgroups, are a kernel feature that organizes processes into hierarchies and applies resource constraints and accounting to each hierarchy. You can cap CPU, memory, block IO, PIDs, and several other resources per group, and you get per-group statistics for free.
Without cgroups, one runaway process can starve the whole machine. With them, you can guarantee that the database container always gets at least two CPUs and the analytics job never exceeds 4GB of RAM, regardless of what else is running.
Mental Model
A cgroup is a directory under /sys/fs/cgroup. Inside that directory are control files you write to (memory.max, cpu.max, cpu.weight) and stat files you read from (memory.current, cpu.stat). Processes belong to a cgroup; writing a PID into cgroup.procs moves it there.
cgroups are hierarchical. A child inherits the limits of its parent and can only tighten them, never loosen. This is how Kubernetes nests pod cgroups inside QoS-class cgroups inside the node-level cgroup.
There are two versions: v1 (one hierarchy per controller, legacy) and v2 (one unified hierarchy, the future). Most modern distros default to v2.
Hands-on Example
Look at a running container’s cgroup:
docker run -d --name demo --memory 256m --cpus 1.5 nginx:1.27
cat /proc/$(docker inspect -f '{{.State.Pid}}' demo)/cgroup
On cgroup v2 you will see one line pointing under /sys/fs/cgroup/system.slice/docker-<id>.scope/. Inside that directory:
cgdir=/sys/fs/cgroup/system.slice/docker-*.scope
cat $cgdir/memory.max # 268435456 (256 MiB)
cat $cgdir/cpu.max # 150000 100000 (1.5 CPU)
cat $cgdir/memory.current # live usage
/sys/fs/cgroup/ (root)
cpu.max, memory.max, ...
user.slice/
user-1000.slice/
session-3.scope/ <- your login shell
system.slice/
docker.service/
docker-abc123.scope/ <- the container
memory.max = 256MiB
cpu.max = 1.5 CPU
cgroup.procs:
12345
12346
12350 When the container exceeds memory.max, the kernel’s OOM killer fires inside that cgroup. The host stays healthy; the container loses its biggest process.
cpu.max works as a quota over a period. 150000 100000 means 150ms of CPU per 100ms wall time, across all CPUs combined — effectively 1.5 cores.
Common Pitfalls
Confusing limits with reservations. cpu.max is a ceiling. cpu.weight is a relative share that only matters under contention. Kubernetes requests map to weights, limits map to quotas — get them backwards and your scheduling decisions go sideways.
OOM kills that look like crashes. When memory.max is hit, the kernel kills a process inside the cgroup, often without a userspace message. Look at dmesg or journalctl -k and you will see “Memory cgroup out of memory.” Always check kernel logs when a container disappears.
CPU throttling silently degrading latency. A container with cpu.max set can burn through its quota and be throttled for the remainder of the period, adding tens of milliseconds of latency. cat cpu.stat shows nr_throttled and throttled_usec. If those are nonzero and growing, raise the limit or reshape the workload.
Mixing v1 and v2. Older runtimes assume v1, newer ones prefer v2. On a host with the hybrid layout, the wrong runtime can silently ignore limits. Check stat -fc %T /sys/fs/cgroup — cgroup2fs means pure v2.
Forgetting that PID limits exist. Fork bombs are still a thing. pids.max caps the number of tasks in a cgroup and is cheap insurance.
Practical Tips
Read cgroup.stat and *.pressure files. Pressure Stall Information (PSI) is the v2 feature that tells you how often the cgroup waited on CPU, memory, or IO. It is the most useful signal for “is this container starved?”
When running cgroup-aware tools inside a container, expose /sys/fs/cgroup read-only. Modern JVMs, Node, and Go runtimes read their limits from there to size thread pools and GC heaps.
Use systemd-run --scope -p MemoryMax=512M your-command for ad-hoc isolation outside Docker. It is the easiest way to play with cgroups without a runtime in the way.
When debugging OOM kills, check memory.events. It counts oom, oom_kill, and high events — far more reliable than tailing dmesg.
For multi-tenant hosts, set limits at the slice level (system.slice, user.slice) and let children inherit. This protects the host from any single tenant.
Wrap-up
cgroups are the boring, essential plumbing that makes containers a credible isolation primitive. Once you can find a container’s cgroup directory, read its limits, and check its pressure stats, the magic of Docker and Kubernetes becomes legible — they are mostly tools that write the right values into the right files. The next time a pod gets OOM-killed or a service mysteriously slows under load, you will know exactly where to look.
Related articles
- Linux Linux Network Namespaces Tutorial
Hands-on guide to Linux network namespaces. Create isolated network stacks, connect them with veth pairs, and understand how containers get networking.
- Linux Linux Cron and systemd Timers: A Practical Comparison
Run scheduled jobs on Linux with cron or systemd timers. How they differ, when to choose each, and recipes that survive reboots and log rotations.
- Linux Linux Disk Management and LVM: A Hands-on Tutorial
Partition disks, build LVM volume groups, grow filesystems online, and recover safely. The Linux storage stack from physical disks to mounted paths.
- Linux Linux File Permissions: A chmod and chown Deep Dive
Understand the Linux permission model from user/group/other to setuid and sticky bits, with practical chmod and chown patterns you can use today.