Python Dataclasses: Less Boilerplate, More Clarity
A practical guide to Python dataclasses: the @dataclass decorator, field defaults, frozen instances, __post_init__, and comparisons with NamedTuple and Pydantic.
What you'll learn
- ✓How @dataclass generates __init__, __repr__, and __eq__ for you
- ✓How to use field() for defaults, default factories, and excluded fields
- ✓How frozen=True gives you immutable, hashable instances
- ✓How to use __post_init__ for validation and derived values
- ✓When to pick a dataclass over NamedTuple or Pydantic
Prerequisites
- •Comfortable with Python classes
Python classes used to require a lot of typing for the most boring possible reason: storing a few attributes and printing them nicely. @dataclass, added in Python 3.7, fixed that. If you have been writing __init__ methods that look like assignments and __repr__ methods that look like f-strings, this article is for you.
The simplest dataclass
@dataclass reads the class body, finds the type-annotated attributes, and generates __init__, __repr__, and __eq__ from them.
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
p = Point(1, 2)
print(p) # Point(x=1, y=2)
print(p == Point(1, 2)) # True
That is roughly forty lines of boilerplate you did not have to write. The trade-off: every attribute must have a type annotation. That is also good practice, so it is not really a trade-off.
Defaults and default_factory
Plain defaults work as expected:
@dataclass
class User:
name: str
active: bool = True
Mutable defaults are a trap. You cannot write tags: list[str] = [] because every instance would share the same list. Dataclasses will refuse and raise an error. Use field(default_factory=...) instead.
from dataclasses import dataclass, field
@dataclass
class User:
name: str
tags: list[str] = field(default_factory=list)
metadata: dict[str, str] = field(default_factory=dict)
field also has flags you will actually use:
repr=Falseto hide a field from the generated__repr__(good for passwords and tokens).compare=Falseto exclude a field from__eq__and ordering.init=Falseto leave a field out of__init__. Pair with a default or__post_init__.
@dataclass
class Account:
username: str
password: str = field(repr=False)
created_at: float = field(init=False, default=0.0)
post_init for derived values and validation
__post_init__ runs after __init__ finishes. Use it for anything that depends on the initial values: computing derived fields, validating input, or normalizing data.
import time
from dataclasses import dataclass, field
@dataclass
class Account:
username: str
created_at: float = field(init=False)
def __post_init__(self):
if not self.username.isidentifier():
raise ValueError("invalid username")
self.created_at = time.time()
Pair this with the rules from error handling: validate early, raise specific exceptions, and let the caller decide what to do.
Frozen dataclasses
frozen=True makes instances immutable. Attempting to set an attribute raises FrozenInstanceError. Frozen dataclasses are hashable by default, so you can put them in sets and use them as dict keys.
@dataclass(frozen=True)
class Coord:
lat: float
lng: float
c = Coord(40.7, -74.0)
# c.lat = 41.0 # raises FrozenInstanceError
seen = {c}
I reach for frozen=True whenever a class represents a value rather than an entity. Coordinates, money amounts, configuration snapshots, anything that should compare by value and never mutate.
Other decorator flags worth knowing
order=Truegenerates__lt__,__le__,__gt__,__ge__based on field order. Handy for sorting.slots=True(3.10+) generates__slots__, which saves memory and prevents typos likeobj.usrname = "x"from silently creating new attributes.kw_only=True(3.10+) forces all fields to be keyword-only in__init__. Useful for classes with many fields.
@dataclass(slots=True, frozen=True, order=True)
class Version:
major: int
minor: int
patch: int
Inheritance and field order
Dataclasses can inherit from other dataclasses. The catch: fields without defaults must come before fields with defaults across the whole MRO. If a parent has a defaulted field and the child adds a required one, Python will complain. The clean fix is kw_only=True, which removes the ordering constraint.
@dataclass(kw_only=True)
class Base:
id: int
name: str = "anonymous"
@dataclass(kw_only=True)
class Admin(Base):
permissions: list[str]
asdict, astuple, and replace
The dataclasses module ships a few utilities you will use often:
from dataclasses import asdict, astuple, replace
@dataclass
class Point:
x: int
y: int
p = Point(1, 2)
asdict(p) # {'x': 1, 'y': 2}
astuple(p) # (1, 2)
replace(p, x=10) # Point(x=10, y=2), original unchanged
replace is especially nice with frozen dataclasses. It is the idiomatic way to produce a modified copy.
Dataclass vs NamedTuple
typing.NamedTuple and collections.namedtuple predate dataclasses and are still useful.
- NamedTuple is a tuple, so it is iterable, indexable, and immutable. Good for lightweight records that flow through code as positional data.
- Dataclass is a regular class. You can add methods, mutate it, subclass it, give it
__post_init__logic, and use slots.
Rule of thumb: if you would happily store the thing in a CSV row and never grow it, NamedTuple is fine. If it might gain a method, validation, or extra behavior, use a dataclass.
Dataclass vs Pydantic
Pydantic models look similar but do more work. They parse, coerce, and validate input data at construction time. Dataclasses trust you and only check types if you check them yourself.
- For internal data structures in your own code: dataclasses are perfect and have zero runtime overhead beyond a regular class.
- For boundaries with the outside world (API request bodies, config files, env vars): use Pydantic or
dataclassesplus a validation library. The cost of bad input data is high there.
You can also use pydantic.dataclasses.dataclass if you want Pydantic validation with a dataclass-style API. That is a reasonable middle ground.
Common pitfalls
- Mutable defaults written as
= []or= {}. Always usefield(default_factory=...). - Forgetting type annotations. A bare
name = "foo"in the class body is treated as a class variable and ignored by@dataclass. Usename: str = "foo". - Comparing dataclasses with
==and being surprised. If you do not want equality based on every field, mark some withcompare=False. - Mutating frozen dataclasses via container fields.
frozen=Truestops attribute assignment, but if a field is a list, you can still callappendon it. Use tuples for true immutability.
When to reach for a dataclass
Any time you write a class whose main job is to hold a few related fields. Configuration objects, value types, in-memory records, message payloads, and intermediate results all benefit. For more on related patterns, see decorators, since @dataclass is itself a class decorator and seeing how one works pulls back the curtain on the rest.
Wrap up
@dataclass turns a class body of annotations into a working class with __init__, __repr__, and __eq__ for free. Use field for mutable defaults and metadata, __post_init__ for derived values and validation, and frozen=True when you want a value type. Prefer NamedTuple for tuple-shaped records and Pydantic when you need parsing and validation at the edges of your system. For everything else, a dataclass is usually the right tool, and your classes will get a lot shorter the moment you start using one.