Skip to content
C Codeloom
Git

Git Large File Storage (LFS) Tutorial

Set up Git LFS to version large binaries like images, models, and datasets without bloating your repository, including tracking, migration, and CI tips.

·4 min read · By Codeloom
Intermediate 8 min read

What you'll learn

  • Why Git struggles with large binary files
  • How LFS pointers and remote storage work
  • Tracking patterns and migrating existing files
  • Configuring CI clones and shallow fetches
  • Common LFS pitfalls and quotas

Prerequisites

  • Familiar with shell
  • Basic Git remote and clone workflow

What and Why

Git compresses text well but treats binary blobs as opaque. A 200 MB PSD checked in once balloons the pack file forever; clones become slow and remote storage costs explode. Git LFS (Large File Storage) solves this by replacing the blob in your repo with a tiny text pointer and pushing the actual bytes to a separate object store.

Typical candidates: design assets, ML model weights, video and audio samples, fixture datasets, and generated PDFs. If a file changes often and is larger than a few megabytes, it likely belongs in LFS.

Mental Model

Without LFS, every revision of a binary lives forever in the Git object database. With LFS, the repo stores a pointer like:

version https://git-lfs.github.com/spec/v1
oid sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
size 12345678

The actual bytes live on the LFS server (GitHub, GitLab, self-hosted) and are fetched on checkout by the LFS smudge filter. A clean filter intercepts new commits and uploads bytes to LFS automatically.

Hands-on Example

Install the client, initialize, and track a pattern. The example versions PNG assets and a model checkpoint.

# One-time install (macOS shown)
brew install git-lfs
git lfs install   # registers smudge/clean filters globally

# Inside an existing repo
cd my-project
git lfs track "*.psd"
git lfs track "assets/**/*.png"
git lfs track "models/*.bin"

# Tracking writes patterns to .gitattributes
cat .gitattributes
# *.psd              filter=lfs diff=lfs merge=lfs -text
# assets/**/*.png    filter=lfs diff=lfs merge=lfs -text
# models/*.bin       filter=lfs diff=lfs merge=lfs -text

git add .gitattributes assets/logo.png models/v1.bin
git commit -m "Track design assets and model in LFS"
git push origin main

To convert files already committed as regular blobs, rewrite history with git lfs migrate:

# Move every .bin in history into LFS, on the current branch
git lfs migrate import --include="*.bin" --include-ref=refs/heads/main

# Verify pointers
git lfs ls-files
# 7d865e959b * models/v1.bin
Without LFS                With LFS
-----------                ----------------------------
repo.git/                  repo.git/
objects/                   objects/
  [200MB blob]               [tiny pointer file]
                           lfs cache (local)
                             [200MB blob]
                           remote LFS store
                             [200MB blob]

clone size: huge           clone size: small
                         checkout pulls needed LFS objects only
LFS replaces binary blobs with pointers; actual bytes live on a separate object store

CI systems often default to shallow clones that still need LFS objects. For GitHub Actions:

- uses: actions/checkout@v4
  with:
    lfs: true

To skip LFS download in environments that don’t need binaries (linting only): GIT_LFS_SKIP_SMUDGE=1 git clone ....

Common Pitfalls

  • Forgetting git lfs install on a new machine. Without it, checkouts produce pointer files instead of real binaries. Symptom: your image viewer says “not a valid PNG.”
  • Tracking after committing. Adding a pattern to .gitattributes only affects future commits. Old revisions remain as regular blobs; use git lfs migrate import to fix history.
  • Hitting quotas. GitHub LFS has bandwidth and storage limits that apply per account. A noisy CI pulling LFS on every build can blow through quotas in days.
  • Force pushes after migration. git lfs migrate import rewrites history. Coordinate with collaborators or rebase chaos follows.
  • Submodule + LFS. Submodules don’t inherit LFS configuration from the parent. Each submodule needs its own git lfs install invocation.

Practical Tips

  • Keep .gitattributes patterns as narrow as possible. A blanket * tracking rule will move source code into LFS too.
  • Audit storage with git lfs ls-files --size and git lfs prune to clean local cache once objects are safely on the remote.
  • For CI, cache the .git/lfs directory between runs to avoid re-downloading unchanged binaries.
  • Use git lfs locking for files that cannot be merged (PSDs, binaries). It coordinates exclusive editing through the server.
  • Document the LFS setup in CONTRIBUTING.md. The first failed clone is otherwise a confusing rite of passage.

Wrap-up

Git LFS keeps your repo lean by storing large binaries out of band and replacing them with pointers. Set up tracking before adding binaries, migrate any historical blobs, and configure CI to fetch what it needs. With those basics in place, you get versioned binaries without paying the clone-time tax that pure Git imposes.