Linux strace Tutorial: See What Your Process Is Really Doing

Intermediate 9 min read

What you'll learn

✓What strace shows and how it gets the data
✓The essential flags you will use daily
✓How to filter to the syscalls that matter
✓Patterns for hangs, slowness, and missing files
✓When strace is the wrong tool

Prerequisites

•Comfortable on a Linux shell

When a program misbehaves and the logs say nothing, strace is often the fastest way to figure out what is actually happening. It records every system call a process makes — every file opened, every byte read, every network connection attempted — and prints them in real time.

What and Why

User-space programs cannot do much on their own. To open a file, talk to the network, allocate memory beyond a certain point, or even check the time precisely, they must ask the kernel through a system call. strace uses the kernel’s ptrace interface to intercept those calls and show you arguments, return values, and timing.

This is invaluable when a process is silent. A web server returning 500s with no log, a script that “just hangs,” or a binary that exits zero but did nothing — strace tells you what it really tried to do.

Mental Model

Picture a wall between your program and the kernel. Every interesting thing the program does — read, write, connect, openat, futex, mmap — crosses that wall. strace stands at the wall and writes down each crossing.

Output looks like this:

openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2487
close(3) = 0

Three columns: the call, the arguments, and the return value (or error). Errors show as -1 ENOENT (No such file or directory).

Hands-on Example

Trace a fresh process:

strace -f -e trace=openat,read,connect curl https://example.com

-f follows children (curl forks for DNS). -e trace=... filters to syscalls you care about.

Attach to a running process:

strace -p 12345 -f -tt -T

-tt adds timestamps, -T shows time spent in each call. To get a summary instead of the firehose:

strace -c -p 12345
# press Ctrl-C after a while

You get a table of syscall counts and total time per call.

Find a missing file:
strace -f -e openat prog 2>&1 | grep ENOENT

Find what a hung process is waiting on:
strace -p <pid>
-> stuck in read(3, ...) ? Inspect FD 3:
   ls -l /proc/<pid>/fd/3

Summarize where time goes:
strace -c -p <pid>
-> sort calls by total_time

Trace network activity only:
strace -e trace=network -f prog

Save output to file:
strace -f -o trace.log prog

Common strace recipes

A real example: a CLI tool fails with “permission denied” but you cannot tell on which file. strace -e openat tool 2>&1 | grep EACCES shows the exact path the kernel refused.

Common Pitfalls

Massive slowdown. strace adds significant overhead, sometimes 10x or more, because every syscall causes context switches in and out of the tracer. Do not strace a busy production process without thinking. Use -c for a sampled summary or reach for bpftrace/perf instead.

Forgetting -f. Without it, you only see the parent. Most shell pipelines and language runtimes fork constantly, so you will miss everything important.

Drowning in output. Plain strace on a Python or Node process is a wall of futex and mmap. Always filter with -e trace= to the family you suspect: file, network, process, signal, or specific calls.

Misreading “hanging.” If strace stops printing, the process is blocked in a syscall, not stuck in a Python loop. The last line tells you exactly which syscall — usually read, poll, epoll_wait, or futex. Check /proc/<pid>/fd to see what those FDs point to.

Trying to strace inside a stripped-down container. The container may lack ptrace capability. Run with --cap-add=SYS_PTRACE or, on Kubernetes, use an ephemeral debug container.

Practical Tips

Pair strace with /proc/<pid>/. The file descriptor numbers in the output are meaningless without context. ls -l /proc/<pid>/fd/ turns FD 7 into the actual socket or file path.

When debugging slowness, use -T -ttt. -T shows seconds spent in each call; -ttt gives absolute microsecond timestamps. Sort by -T to find the slow calls.

For repeated debugging of the same program, save with -o trace.log and grep later. A 100MB trace is easier to grep than to watch scroll by.

Use strace -e trace=%file (a syscall group) instead of listing every file-related call. Groups include %file, %network, %process, %signal, %memory. They survive across kernel versions adding new syscalls.

For production-grade visibility without the overhead, learn bpftrace next. strace teaches you the syscall vocabulary; bpftrace lets you ask the same questions without stopping the world.

Wrap-up

strace is a microscope for the boundary between your process and the kernel. The first time it tells you exactly which file your program failed to open, or that the process is waiting on a socket nobody is writing to, you stop guessing and start fixing. Learn -f, -e trace=, -c, and -p, and you will reach for strace before you reach for a debugger more often than you expect.