Skip to content
C Codeloom
Python

Python Regex Basics with the re Module

A practical introduction to Python regex with the re module: match, search, findall, sub, groups, named groups, raw strings, and compiling patterns for speed.

·6 min read · By Yash Kesharwani
Intermediate 10 min read

What you'll learn

  • The difference between re.match, re.search, and re.fullmatch
  • How to extract data with findall, finditer, and groups
  • How to clean and transform text with re.sub
  • How named groups make patterns readable and self-documenting
  • Why raw strings and re.compile matter in real code

Prerequisites

Regular expressions have a reputation for being write-only code. They do not have to be. The Python re module is small, the patterns you use day to day are short, and once you know the five or six functions that matter, regex stops being scary. This article gives you the working set.

The re module at a glance

You will use a handful of functions: match, search, fullmatch, findall, finditer, and sub. Each takes a pattern and a string. The pattern is a regular Python string, but you should always write it as a raw string with the r"..." prefix so backslashes are not eaten by Python before regex sees them.

import re

text = "order 42 shipped on 2026-06-18"
re.search(r"\d+", text).group()  # '42'

If you skip the r, you have to double every backslash, and you will eventually forget once and spend an hour debugging.

match, search, fullmatch

These three all look for one match, but they differ in where they look.

  • re.match only checks at the start of the string.
  • re.search scans the whole string and returns the first match anywhere.
  • re.fullmatch requires the entire string to match.
re.match(r"\d+", "42 items")        # matches '42'
re.match(r"\d+", "items: 42")       # None
re.search(r"\d+", "items: 42")      # matches '42'
re.fullmatch(r"\d+", "42 items")    # None
re.fullmatch(r"\d+", "42")          # matches '42'

For validation (is this a valid email, is this a valid ID), fullmatch is usually what you want. For extraction, search. For starts-with checks, match.

Every one of these returns a Match object on success or None on failure. Check before you call methods on it.

m = re.search(r"\d+", "no numbers here")
if m:
    print(m.group())
else:
    print("no match")

Groups and named groups

Parentheses in a pattern create groups. You read them out with group(n) (1-indexed) or groups() (a tuple of all groups).

m = re.search(r"(\d{4})-(\d{2})-(\d{2})", "date: 2026-06-18")
m.group()    # '2026-06-18'
m.group(1)   # '2026'
m.groups()   # ('2026', '06', '18')

For anything beyond two groups, name them. Future-you will read this code at 2am and thank you.

m = re.search(
    r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})",
    "date: 2026-06-18",
)
m.group("year")    # '2026'
m.groupdict()      # {'year': '2026', 'month': '06', 'day': '18'}

Named groups also make sub replacements much more readable, as you will see in a moment.

findall and finditer

When you want every match, not just the first one:

re.findall(r"\d+", "a 1, b 22, c 333")
# ['1', '22', '333']

If the pattern has groups, findall returns the groups instead of the full match. That is occasionally surprising.

re.findall(r"(\w+)=(\d+)", "a=1 b=22 c=333")
# [('a', '1'), ('b', '22'), ('c', '333')]

finditer returns an iterator of Match objects, which is what you want when you need positions or named groups for each match.

for m in re.finditer(r"(?P<k>\w+)=(?P<v>\d+)", "a=1 b=22"):
    print(m.start(), m.groupdict())

For more on iterators in general, see generators and iterators.

sub for search and replace

re.sub(pattern, replacement, text) replaces every match. You can reference groups in the replacement with \1, \2, or \g<name>.

re.sub(r"\s+", " ", "lots   of    space")
# 'lots of space'

re.sub(
    r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})",
    r"\g<day>/\g<month>/\g<year>",
    "today is 2026-06-18",
)
# 'today is 18/06/2026'

The replacement can also be a function, which is powerful for non-trivial transforms.

def shout(match):
    return match.group().upper()

re.sub(r"\b\w{5,}\b", shout, "hello there friends")
# 'HELLO there FRIENDS'

re.subn does the same job but also returns the number of replacements made, which is handy when you want to know if anything changed.

Patterns worth memorizing

A small kit covers most everyday tasks:

  • \d digit, \D non-digit
  • \w word char (letters, digits, underscore), \W opposite
  • \s whitespace, \S non-whitespace
  • . any char except newline (or any with re.DOTALL)
  • ^ start, $ end (use re.MULTILINE to anchor each line)
  • \b word boundary
  • ? 0-1, * 0+, + 1+, {n,m} between n and m
  • [abc] character class, [^abc] negated, [a-z] range

A few useful real patterns:

re.findall(r"\b\w+@\w+\.\w+\b", "send to a@b.com or c@d.io")
re.findall(r"https?://\S+", "see https://example.com and http://x.io")
re.sub(r"\s+", " ", "  collapse   whitespace  ").strip()

These are deliberately not bulletproof. Real email and URL parsing has edge cases that regex cannot cleanly cover. For anything user-facing or security-related, use a dedicated parser.

re.compile for hot paths

If you reuse a pattern, compile it once and call methods on the compiled object. This saves the parse cost on every call and reads better.

DATE = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})")

def extract_dates(text: str) -> list[dict]:
    return [m.groupdict() for m in DATE.finditer(text)]

The module-level cache means a one-off re.search is fine. Compile when the pattern lives in a tight loop or is reused across many calls.

You can also pass flags into re.compile or any of the search functions: re.IGNORECASE, re.MULTILINE, re.DOTALL, re.VERBOSE. re.VERBOSE is great for documenting complex patterns:

PHONE = re.compile(r"""
    \(?(\d{3})\)?   # area code
    [\s.-]?         # optional separator
    (\d{3})         # exchange
    [\s.-]?
    (\d{4})         # subscriber
""", re.VERBOSE)

Common pitfalls

  • Forgetting the r prefix, then writing "\d" and wondering why your pattern fails on some platforms.
  • Greedy quantifiers eating too much. .* matches as much as possible. Use .*? for non-greedy when scanning between markers.
  • Using regex for problems that are not regex problems. Parsing HTML, JSON, or programming languages with regex will hurt you. Use a real parser.
  • Catastrophic backtracking on adversarial input. Patterns like (a+)+$ can blow up on long strings. Keep patterns simple and benchmark on realistic data.
  • Treating Match objects as truthy without checking for None. A missed match returns None, and calling .group() on it crashes. Pair with proper error handling if you must.

When to reach for regex

Regex is great for: extracting structured fragments from messy text, validating simple formats, and doing search-and-replace on large bodies of text. It is the wrong tool for parsing nested structures, anything with quoting and escaping rules, or anything where a real grammar exists.

Wrap up

The Python re module gives you a tight set of functions: match, search, fullmatch, findall, finditer, and sub. Write patterns as raw strings, name your groups when there are more than two, and compile patterns you reuse. Lean on the small kit of character classes and quantifiers, and stay away from problems that deserve a real parser. With those habits, regex stops being a write-only weapon and becomes a sharp tool you actually enjoy reaching for.