Reading and Writing Files in Python

Intermediate 10 min read

What you'll learn

✓How open and the with statement work together
✓The full set of file modes and when to use each
✓Idiomatic ways to read text line by line and all at once
✓How to write and append safely
✓Working with paths using pathlib
✓Reading and writing binary files and JSON

Prerequisites

•Comfortable with for loops — see For Loops
•Comfortable with error handling — see Error Handling

Almost every useful program reads or writes something — configuration, logs, data files, CSVs, JSON, or images. Python’s file handling is short on ceremony and long on convention. Learn the half-dozen patterns in this post and you will handle the vast majority of file work cleanly.

`open` and the `with` statement

open(path, mode) returns a file object. The simplest usage:

f = open("notes.txt")
contents = f.read()
f.close()

This works but is fragile: if read() raises, close() never runs. The idiomatic form is the with statement, which closes the file automatically when the block ends — even on exceptions:

with open("notes.txt") as f:
    contents = f.read()

Always use with for file I/O. There is no good reason not to.

You can also open multiple files in one with:

with open("input.txt") as src, open("output.txt", "w") as dst:
    dst.write(src.read())

File modes

The mode string controls how the file is opened. The main characters:

Mode	Meaning
`r`	Read (default). File must exist.
`w`	Write. Creates or truncates the file.
`a`	Append. Creates if missing.
`x`	Exclusive create. Fails if file exists.
`b`	Binary mode (combine with another).
`t`	Text mode (default, combine with another).
`+`	Read and write.

The most common combinations are "r", "w", "a", "rb", and "wb".

Always specify the encoding when working with text. The default is usually UTF-8 but is platform-dependent:

with open("notes.txt", "r", encoding="utf-8") as f:
    ...

Specifying encoding="utf-8" explicitly makes your code portable.

Reading text

A text file object is iterable. The cleanest way to read line by line is to iterate directly:

with open("notes.txt", encoding="utf-8") as f:
    for line in f:
        print(line.rstrip())

Each line includes its trailing newline; rstrip() removes it. Iterating reads one line at a time, so this works for huge files.

To read the whole file at once:

with open("notes.txt", encoding="utf-8") as f:
    text = f.read()

Use read() only when you actually need the whole content in memory. For multi-gigabyte files, iterate line by line instead.

To read all lines into a list:

with open("notes.txt", encoding="utf-8") as f:
    lines = f.readlines()

Equivalent to list(f). Lines still include their trailing newlines.

Writing text

"w" mode truncates the file (or creates it). "a" appends. In both cases, write does not add a newline — you do:

with open("output.txt", "w", encoding="utf-8") as f:
    f.write("first line\n")
    f.write("second line\n")

For many lines, writelines takes any iterable of strings (and still does not add newlines):

lines = ["first\n", "second\n", "third\n"]
with open("output.txt", "w", encoding="utf-8") as f:
    f.writelines(lines)

Or use print with a file= argument, which does add the newline for you:

with open("output.txt", "w", encoding="utf-8") as f:
    for line in ["first", "second", "third"]:
        print(line, file=f)

I find print(..., file=f) the most readable for one-line-per-record output.

Try it yourself. Create a file numbers.txt containing the integers 1 through 20, one per line. Then read it back and print only the even numbers. Use with for both the write and the read.

Paths with `pathlib`

The pathlib module is the modern, object-oriented way to work with paths. It is part of the standard library and replaces most uses of os.path:

from pathlib import Path

p = Path("data") / "users" / "alice.json"
print(p)              # data/users/alice.json
print(p.parent)       # data/users
print(p.name)         # alice.json
print(p.stem)         # alice
print(p.suffix)       # .json

Path objects support direct I/O:

from pathlib import Path

text = Path("notes.txt").read_text(encoding="utf-8")
Path("output.txt").write_text("Hello!\n", encoding="utf-8")

read_text and write_text are convenient for small files. For anything streaming or line-by-line, fall back to Path.open(...), which is the same as open(path, ...).

A few more handy methods:

p = Path("notes.txt")
p.exists()                      # True or False
p.is_file()                     # True if regular file
p.is_dir()                      # True if directory
p.stat().st_size                # size in bytes
p.unlink(missing_ok=True)       # delete if it exists

Path("logs").mkdir(parents=True, exist_ok=True)
for entry in Path("logs").iterdir():
    print(entry)

for path in Path("src").rglob("*.py"):
    print(path)

pathlib consistently returns Path objects, not strings. Mixing the two works almost everywhere, but stay consistent within a function.

Handling missing files and other errors

open raises FileNotFoundError if the path does not exist in read mode, and PermissionError if the operating system refuses access. Wrap I/O in try/except when the program can do something useful on failure:

from pathlib import Path

def load_optional_config(path: str) -> dict:
    try:
        return parse(Path(path).read_text(encoding="utf-8"))
    except FileNotFoundError:
        return {}

If you cannot recover, do nothing — let the exception propagate. A traceback with a clear FileNotFoundError is far more useful than a silent empty result. See Error Handling for the full philosophy.

Binary files

For non-text content — images, executables, compressed data — open in binary mode. read and write then deal in bytes, not str:

with open("image.png", "rb") as src, open("copy.png", "wb") as dst:
    dst.write(src.read())

For very large binary files, read in chunks:

CHUNK = 64 * 1024
with open("input.bin", "rb") as src, open("output.bin", "wb") as dst:
    while True:
        chunk = src.read(CHUNK)
        if not chunk:
            break
        dst.write(chunk)

Do not mix text and binary modes — the type errors are confusing. Decide upfront which kind of data you are handling.

Working with JSON

JSON is the most common structured-data format. The json module integrates with file I/O directly:

import json
from pathlib import Path

data = {"name": "Alice", "roles": ["admin", "editor"]}

# Write
Path("user.json").write_text(json.dumps(data, indent=2), encoding="utf-8")

# Read
loaded = json.loads(Path("user.json").read_text(encoding="utf-8"))
print(loaded["roles"])    # ['admin', 'editor']

json.dump(obj, file) and json.load(file) take file objects directly if you want to stream:

import json
with open("user.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)

with open("user.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

For data that fits in memory, the read_text/json.loads pair is hard to beat for clarity.

Try it yourself. Write a function count_words_in_file(path) that returns a dictionary mapping each word to its count, using the counting pattern from Python Dictionaries. Then write the result to counts.json, formatted with indent=2.

A worked example: a tiny log analyser

A small program that reads a log file, counts log levels, and writes a summary as JSON:

import json
from collections import Counter
from pathlib import Path

LEVELS = {"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}

def analyze(log_path: Path, summary_path: Path) -> None:
    counter: Counter[str] = Counter()
    line_count = 0

    with log_path.open(encoding="utf-8") as f:
        for line in f:
            line_count += 1
            parts = line.split(None, 2)
            if len(parts) >= 2 and parts[1] in LEVELS:
                counter[parts[1]] += 1

    summary = {
        "source": str(log_path),
        "lines": line_count,
        "by_level": dict(counter),
    }
    summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8")

# Example usage (assuming app.log exists):
# analyze(Path("app.log"), Path("summary.json"))

This combines streaming line-by-line reading, pathlib, a Counter, and JSON output — a complete, realistic shape for a small data-processing script.

Recap

You now know:

Always use with open(...) as f: — it closes the file automatically
Specify encoding="utf-8" for text files to keep your code portable
Iterate the file object for line-by-line reading; read() for whole-file
"w" truncates, "a" appends, "x" fails if the file exists
pathlib is the modern path API — Path.read_text, Path.write_text, Path.iterdir
Binary mode ("rb", "wb") deals in bytes, not str
JSON read/write fits naturally with json.dumps/json.loads and Path

Next steps

You now have a complete toolkit for reading and writing data. The final post in this intermediate series shows how to break larger code into modules and import them — the last building block before you start writing genuinely substantial Python.

→ Next: Modules and Imports in Python

Questions or feedback? Email codeloomdevv@gmail.com.