Go pprof Profiling Tutorial
Profile Go programs with pprof: enable the HTTP endpoint, capture CPU and heap profiles, read flame graphs, and find the hot spot that is actually costing you latency.
What you'll learn
- ✓What profiles pprof captures and how it captures them
- ✓Wiring net/http/pprof into a service
- ✓Collecting CPU, heap, goroutine, and mutex profiles
- ✓Reading flame graphs and top output
- ✓Avoiding common misreads
Prerequisites
- •A running Go service or program
- •Comfort with the command line
When a Go service is slow, the temptation is to guess. pprof is the tool that turns guesses into evidence. It samples your program, attributes time and allocations to functions, and renders the result so the costly path is obvious. It is part of the standard library and takes about three lines to enable.
What pprof is and why
pprof is a sampling profiler with multiple modes. The CPU profile samples program counters at a regular interval and tells you where the CPU spent its time. The heap profile records live allocations. There are also goroutine, mutex, and block profiles.
The “why” is simple: code intuition is wrong roughly half the time. A profile tells you the actual hot function, which is often a json.Unmarshal or a regex compile in a loop rather than the algorithm you suspected.
Mental model
Each profile is a tree of stack samples. CPU profiles sample at 100 Hz by default; if your function appears in 30 percent of samples, it accounts for roughly 30 percent of CPU time. Heap profiles work the same way but count allocation events instead of clock ticks.
The flame graph is the most useful visualization. Width equals share of samples. Height equals stack depth. Look for the widest single function near the top, and that is your hot spot.
Hands-on example
Enable the HTTP endpoint in your service.
import (
"net/http"
_ "net/http/pprof"
)
func main() {
go func() { http.ListenAndServe("localhost:6060", nil) }()
// your real server on a different port
}
Now collect a 30 second CPU profile under load.
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile?seconds=30
This opens a browser with flame graph, top list, source view, and graph. Use the top command in the terminal version for a quick textual summary.
(pprof) top
Showing nodes accounting for 8.2s, 82% of 10s total
flat cum function
3.40s 3.40s encoding/json.(*decodeState).object
1.90s 4.20s example.com/api.handleSearch
1.10s 1.30s runtime.mallocgc
For heap, swap the URL: /debug/pprof/heap. For goroutines: /debug/pprof/goroutine?debug=2 gives a readable text dump useful for diagnosing leaks.
Common pitfalls
Profiling under no load is meaningless. The CPU profile will be empty or full of runtime idle. Generate realistic traffic before you sample, or you will conclude that your service spends all its time in runtime.netpollwait.
Heap profiles show in-use allocations by default. If you want to see where allocation pressure comes from (and therefore GC cost), use ?gc=1 or look at the alloc_space view. People often profile heap, see almost nothing, and miss the churn entirely.
Inlined functions can vanish from the flame graph. Compile with -gcflags="-l" during profiling to disable inlining temporarily, or read the flame graph with the understanding that the parent function attributes the cost.
Do not expose /debug/pprof on a public interface. Bind it to localhost, a unix socket, or behind auth. The profile endpoints reveal internals and the CPU profile can briefly affect performance.
Practical tips
For CPU work, focus on the widest box near the top of the flame graph that you can actually change. The runtime functions at the bottom are usually noise; the function in your code two levels up is the lever.
For allocations, go test -bench=. -benchmem -memprofile=mem.out and then go tool pprof mem.out is the fastest loop. You can compare two profiles with pprof -diff_base=before.pb.gz after.pb.gz to confirm an optimization moved the needle.
Goroutine profiles diagnose leaks and deadlocks. If goroutine count grows over time, take a profile, then take another five minutes later, and diff the two. The growing stacks are your leak.
Mutex and block profiles are off by default. Enable them with runtime.SetMutexProfileFraction(5) and runtime.SetBlockProfileRate(1) when you suspect contention. The cost is small and the insight can be enormous.
Wrap-up
pprof turns performance work from guesswork into a tight loop. Wire in net/http/pprof, generate realistic load, collect the relevant profile, and read the flame graph from the widest box down. Once you can ask “where is the time actually going?” with confidence, optimization stops being a stab in the dark.
Related articles
- Go Go Build Tags Explained
Use Go build tags to include or exclude files per OS, architecture, or custom condition. Learn the new //go:build syntax, common patterns, and how tags interact with the test runner.
- Go Go Context Cancellation Patterns
Master Go's context package: propagate deadlines, cancel goroutines safely, and avoid leaks with practical patterns for HTTP, database, and pipeline code.
- Go Go context Package Explained
How to use Go's context package effectively: cancellation, deadlines, propagation, request-scoped values, and the patterns that keep services responsive.
- Go Go database/sql Tutorial
Use Go's standard database/sql package the right way: drivers, connection pools, prepared statements, transactions, context cancellation, and avoiding the classic Rows.Close leak.