Python Multiprocessing vs Threading
When to use threads, when to use processes, and why the GIL shapes both choices. A practical comparison with code, benchmarks, and patterns for real workloads.
What you'll learn
- ✓Why the GIL exists and what it actually blocks
- ✓When threads beat processes despite the GIL
- ✓How multiprocessing shares data and the cost involved
- ✓Choosing between thread pools, process pools, and asyncio
- ✓Patterns for mixing all three safely
Prerequisites
- •Basic Python
What and why
Python has two ways to run code concurrently in the same program: threads and processes. They look similar from the API surface (ThreadPoolExecutor, ProcessPoolExecutor), but they behave very differently because of the Global Interpreter Lock (GIL).
The GIL is a mutex inside CPython that allows only one thread to execute Python bytecode at a time. C extensions can release it during I/O or heavy native work, which is why threading is still useful. For pure-Python CPU loops, threading gives you no speedup.
Mental model
Threads share memory and a single Python interpreter; processes each have their own interpreter and their own memory. The trade-off: threads are cheap and share data for free but compete for the GIL; processes scale across cores but pay a serialization cost to share anything.
Threading (one process, multiple threads)
+-----------------------------------------+
| Python interpreter + GIL |
| T1 T2 T3 <- only one holds GIL |
| | | | at a time |
| shared memory, shared modules |
+-----------------------------------------+
Multiprocessing (N processes)
+---------------+ +---------------+ +---------------+
| Interp + GIL | | Interp + GIL | | Interp + GIL |
| own memory | | own memory | | own memory |
+---------------+ +---------------+ +---------------+
\ | /
\ v /
+--- pickle over pipe ----+
(cost of sharing data) Hands-on example
A CPU-bound function: count primes under N.
import math, time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def count_primes(n):
return sum(1 for x in range(2, n) if all(x % d for d in range(2, int(math.isqrt(x))+1)))
def bench(executor_cls, n=200_000, workers=4):
start = time.perf_counter()
with executor_cls(max_workers=workers) as ex:
list(ex.map(count_primes, [n] * workers))
return time.perf_counter() - start
print("threads: ", bench(ThreadPoolExecutor))
print("processes:", bench(ProcessPoolExecutor))
On a four-core machine, threads complete in roughly the same time as a sequential run because the GIL serializes the work. Processes complete in roughly one quarter the time because each interpreter runs on its own core.
For I/O-bound work, threads win. They are cheap and the GIL is released around socket reads.
import requests
from concurrent.futures import ThreadPoolExecutor
urls = ["https://example.com"] * 50
with ThreadPoolExecutor(max_workers=20) as ex:
results = list(ex.map(requests.get, urls))
Spinning up twenty processes for this would waste seconds on process startup and pickling.
Sharing state between processes requires serialization. multiprocessing.Queue pickles items across a pipe. For large NumPy arrays, use multiprocessing.shared_memory so workers see the same buffer without copying.
from multiprocessing import shared_memory
import numpy as np
a = np.arange(10_000_000, dtype=np.int64)
shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
buf = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
buf[:] = a[:]
Children attach by name and operate on the same buffer.
Common pitfalls
Spawning processes on Windows and macOS uses the spawn start method, which re-imports your module. Put process-creation code under if __name__ == "__main__": or you will fork-bomb yourself.
Pickling errors are the most common failure. Anything you send to a process pool must be picklable: no lambdas, no local functions, no open file handles. Use top-level functions or define classes at module scope.
The GIL is not magic. C extensions that hold the GIL across long operations (some image processing libraries, some database drivers) starve other threads. If threading.active_count() is high but throughput is flat, suspect a C extension hogging the GIL.
Threads and signal do not mix well. Only the main thread can receive signals. Use threading.Event to coordinate shutdown.
Mixing asyncio with threads requires loop.run_in_executor or asyncio.to_thread. Calling sync blocking code directly from a coroutine stalls the loop.
Production tips
Default to asyncio for I/O at scale. A thread per connection caps at thousands; asyncio handles tens of thousands on the same hardware.
Use processes for CPU work that does not fit a single core. Bound pool size to the number of physical cores; oversubscription costs more in context switches than it gives in parallelism.
Pin pool size from configuration, not from os.cpu_count() in containers. Inside Kubernetes you see the node’s cores, not the pod’s allocation, and you will starve neighbors.
For long-running pools, set max_tasks_per_child so workers recycle. This bounds memory leaks in third-party libraries you do not control.
ProcessPoolExecutor(max_workers=4, max_tasks_per_child=100)
Profile before optimizing. cProfile plus py-spy will tell you whether you are I/O-bound or CPU-bound. The wrong concurrency model is worse than no concurrency model.
Wrap-up
Threads for I/O, processes for CPU, asyncio for high-fanout I/O. The GIL is why; honor it and your code scales, ignore it and you will benchmark identical numbers across configurations. Guard process startup with __main__, keep payloads picklable, and pin pool sizes deliberately. Once you internalize this, the choice for any given function takes about ten seconds.
Related articles
- Python Python asyncio Event Loop Guide
Understand how Python's asyncio event loop schedules coroutines, what await actually does, and how to avoid the classic mistakes that turn async code into a tangle of bugs.
- Python Threading vs Multiprocessing in Python
Understand the Python GIL and pick the right concurrency tool: when threads help with I/O, when processes help with CPU, and how to use concurrent.futures.
- Python Python Decorators Deep Dive
A practical tour of Python decorators: how they work under the hood, when to use them, and how to write decorators that preserve metadata, accept arguments, and stack cleanly.
- Python Python Logging Best Practices
How to set up Python logging properly: loggers vs handlers, structured logs, contextual fields, log levels that scale, and how to avoid the classic print-debug trap.