Linux strace Tutorial: See What Your Process Is Really Doing
Use strace to inspect every system call a process makes. Learn the essential flags, how to filter noise, and the patterns for debugging hangs, slowness, and crashes.
What you'll learn
- ✓What strace shows and how it gets the data
- ✓The essential flags you will use daily
- ✓How to filter to the syscalls that matter
- ✓Patterns for hangs, slowness, and missing files
- ✓When strace is the wrong tool
Prerequisites
- •Comfortable on a Linux shell
When a program misbehaves and the logs say nothing, strace is often the fastest way to figure out what is actually happening. It records every system call a process makes — every file opened, every byte read, every network connection attempted — and prints them in real time.
What and Why
User-space programs cannot do much on their own. To open a file, talk to the network, allocate memory beyond a certain point, or even check the time precisely, they must ask the kernel through a system call. strace uses the kernel’s ptrace interface to intercept those calls and show you arguments, return values, and timing.
This is invaluable when a process is silent. A web server returning 500s with no log, a script that “just hangs,” or a binary that exits zero but did nothing — strace tells you what it really tried to do.
Mental Model
Picture a wall between your program and the kernel. Every interesting thing the program does — read, write, connect, openat, futex, mmap — crosses that wall. strace stands at the wall and writes down each crossing.
Output looks like this:
openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2487
close(3) = 0
Three columns: the call, the arguments, and the return value (or error). Errors show as -1 ENOENT (No such file or directory).
Hands-on Example
Trace a fresh process:
strace -f -e trace=openat,read,connect curl https://example.com
-f follows children (curl forks for DNS). -e trace=... filters to syscalls you care about.
Attach to a running process:
strace -p 12345 -f -tt -T
-tt adds timestamps, -T shows time spent in each call. To get a summary instead of the firehose:
strace -c -p 12345
# press Ctrl-C after a while
You get a table of syscall counts and total time per call.
Find a missing file:
strace -f -e openat prog 2>&1 | grep ENOENT
Find what a hung process is waiting on:
strace -p <pid>
-> stuck in read(3, ...) ? Inspect FD 3:
ls -l /proc/<pid>/fd/3
Summarize where time goes:
strace -c -p <pid>
-> sort calls by total_time
Trace network activity only:
strace -e trace=network -f prog
Save output to file:
strace -f -o trace.log prog A real example: a CLI tool fails with “permission denied” but you cannot tell on which file. strace -e openat tool 2>&1 | grep EACCES shows the exact path the kernel refused.
Common Pitfalls
Massive slowdown. strace adds significant overhead, sometimes 10x or more, because every syscall causes context switches in and out of the tracer. Do not strace a busy production process without thinking. Use -c for a sampled summary or reach for bpftrace/perf instead.
Forgetting -f. Without it, you only see the parent. Most shell pipelines and language runtimes fork constantly, so you will miss everything important.
Drowning in output. Plain strace on a Python or Node process is a wall of futex and mmap. Always filter with -e trace= to the family you suspect: file, network, process, signal, or specific calls.
Misreading “hanging.” If strace stops printing, the process is blocked in a syscall, not stuck in a Python loop. The last line tells you exactly which syscall — usually read, poll, epoll_wait, or futex. Check /proc/<pid>/fd to see what those FDs point to.
Trying to strace inside a stripped-down container. The container may lack ptrace capability. Run with --cap-add=SYS_PTRACE or, on Kubernetes, use an ephemeral debug container.
Practical Tips
Pair strace with /proc/<pid>/. The file descriptor numbers in the output are meaningless without context. ls -l /proc/<pid>/fd/ turns FD 7 into the actual socket or file path.
When debugging slowness, use -T -ttt. -T shows seconds spent in each call; -ttt gives absolute microsecond timestamps. Sort by -T to find the slow calls.
For repeated debugging of the same program, save with -o trace.log and grep later. A 100MB trace is easier to grep than to watch scroll by.
Use strace -e trace=%file (a syscall group) instead of listing every file-related call. Groups include %file, %network, %process, %signal, %memory. They survive across kernel versions adding new syscalls.
For production-grade visibility without the overhead, learn bpftrace next. strace teaches you the syscall vocabulary; bpftrace lets you ask the same questions without stopping the world.
Wrap-up
strace is a microscope for the boundary between your process and the kernel. The first time it tells you exactly which file your program failed to open, or that the process is waiting on a socket nobody is writing to, you stop guessing and start fixing. Learn -f, -e trace=, -c, and -p, and you will reach for strace before you reach for a debugger more often than you expect.
Related articles
- Linux Linux cgroups Explained: How Containers Get Their Limits
A practical introduction to Linux control groups. Learn what cgroups do, how v1 and v2 differ, and how Docker and Kubernetes use them to cap CPU and memory.
- Linux Linux Cron and systemd Timers: A Practical Comparison
Run scheduled jobs on Linux with cron or systemd timers. How they differ, when to choose each, and recipes that survive reboots and log rotations.
- Linux Linux Disk Management and LVM: A Hands-on Tutorial
Partition disks, build LVM volume groups, grow filesystems online, and recover safely. The Linux storage stack from physical disks to mounted paths.
- Linux Linux File Permissions: A chmod and chown Deep Dive
Understand the Linux permission model from user/group/other to setuid and sticky bits, with practical chmod and chown patterns you can use today.