Skip to content
C Codeloom
Linux

Linux Process Management Explained: ps, top, kill, and Beyond

Learn how Linux processes work, how to inspect them with ps and top, and how to control them with signals so your servers stay responsive.

·5 min read · By Codeloom
Beginner 9 min read

What you'll learn

  • What a Linux process really is
  • How PIDs and parent-child relationships work
  • How to list and filter processes with ps
  • How to monitor live load with top and htop
  • How to send signals safely with kill

Prerequisites

  • A terminal you can run commands in

What and Why

A process is a running instance of a program. Every command you type in a shell becomes a process: it gets a numeric Process ID (PID), an owner, a working directory, open files, and a small slice of memory. The kernel schedules processes onto CPUs, suspends them when they wait for I/O, and reaps them when they exit.

Knowing how to inspect and control processes is the difference between guessing why a server is slow and actually fixing it. When a Node app pegs a CPU, when a stuck rsync blocks a deploy, or when a zombie process clutters your tree, you reach for the tools below.

Mental Model

Linux starts a single process at boot called init (today usually systemd, PID 1). Every other process is a descendant of PID 1. When process A starts process B, A is the parent and B is the child. Children inherit environment variables, open file descriptors, and the current directory.

systemd (PID 1)
|-- sshd
|     '-- bash (your login shell)
|           '-- vim
|-- nginx
|     |-- nginx worker
|     '-- nginx worker
'-- cron
      '-- backup.sh
Typical process tree on a server

When a child exits, the kernel keeps a tiny record (the exit status) until the parent reads it with wait(). If the parent never reads it, the child becomes a zombie: dead but still in the table. If the parent dies first, the orphan is adopted by PID 1.

Hands-on Example

Open two terminals. In the first, start a long-running process:

sleep 300 &

The shell prints something like [1] 48211. That number is the PID. Now inspect it:

ps -p 48211 -o pid,ppid,user,stat,cmd

STAT is the state code: R running, S sleeping, D uninterruptible sleep (often disk I/O), Z zombie, T stopped. The + suffix means foreground in a terminal.

List every process on the machine with a forest view:

ps -ef --forest | less
ps auxf | head -40

Both work; aux is the BSD style and -ef is the System V style. Filter by name:

pgrep -a sshd
pgrep -fl node

For a live view, use top (built in) or htop (nicer, install separately). Press P in top to sort by CPU, M for memory, 1 to expand per-core stats, and k to kill.

Now send a signal to your sleep:

kill -TERM 48211   # polite request to terminate
kill -KILL 48211   # forced; cannot be caught or ignored
kill -HUP $(pgrep nginx | head -1)  # reload config

SIGTERM (15) asks nicely. SIGKILL (9) is the hammer: the kernel terminates the process without giving it a chance to flush buffers. Always try TERM first. SIGHUP (1) is conventionally used to ask daemons to reload their configuration.

To send a signal to every matching process:

pkill -TERM -f "node server.js"
killall -USR1 nginx

You can also pause and resume processes from a terminal: Ctrl+Z sends SIGTSTP (stop), bg resumes in the background, fg brings it back, and jobs lists jobs in the current shell.

Common Pitfalls

Reaching for kill -9 first. It robs the process of any chance to clean up: open files may be left corrupt, database connections leak, and child processes can be orphaned. Try TERM and wait a few seconds before escalating.

Confusing high load average with high CPU. Load includes processes in D state waiting on disk. A box with load 20 and idle CPUs is usually I/O bound, not CPU bound. Check iostat -xz 1 or vmstat 1.

Misreading ps memory columns. VSZ is virtual size (address space reserved), not actual usage. RSS is resident set size, the real RAM in use right now. Sum of RSS across processes overcounts shared libraries.

Forgetting that child processes survive a closed SSH session only if they were detached. Use nohup, disown, tmux, or a proper systemd unit for anything long-lived.

Killing zombies directly. You cannot kill a zombie; it is already dead. Fix the parent so it calls wait(), or restart the parent.

Practical Tips

Use ps -eo pid,etime,user,cmd --sort=-etime | head to find the oldest processes; long-lived runaway scripts often hide there.

Combine pgrep with xargs for batch actions: pgrep -f stuck-worker | xargs -r renice +10.

pidstat 1 (from sysstat) shows per-process CPU, memory, and I/O over time without the flicker of top.

For a tree view without ps, run pstree -p or pstree -ap PID to see how a process was spawned.

When something is wedged in D state, check cat /proc/PID/stack and cat /proc/PID/wchan to see which kernel function it is stuck in.

Wrap-up

Linux process management boils down to three skills: listing what is running, understanding the parent-child tree, and sending the right signal at the right time. Start with ps, pgrep, and top, escalate to htop and pidstat, and reserve kill -9 for genuine emergencies. With those habits you can diagnose load spikes and stuck jobs without rebooting.