Git Large File Storage (LFS) Tutorial
Set up Git LFS to version large binaries like images, models, and datasets without bloating your repository, including tracking, migration, and CI tips.
What you'll learn
- ✓Why Git struggles with large binary files
- ✓How LFS pointers and remote storage work
- ✓Tracking patterns and migrating existing files
- ✓Configuring CI clones and shallow fetches
- ✓Common LFS pitfalls and quotas
Prerequisites
- •Familiar with shell
- •Basic Git remote and clone workflow
What and Why
Git compresses text well but treats binary blobs as opaque. A 200 MB PSD checked in once balloons the pack file forever; clones become slow and remote storage costs explode. Git LFS (Large File Storage) solves this by replacing the blob in your repo with a tiny text pointer and pushing the actual bytes to a separate object store.
Typical candidates: design assets, ML model weights, video and audio samples, fixture datasets, and generated PDFs. If a file changes often and is larger than a few megabytes, it likely belongs in LFS.
Mental Model
Without LFS, every revision of a binary lives forever in the Git object database. With LFS, the repo stores a pointer like:
version https://git-lfs.github.com/spec/v1
oid sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
size 12345678
The actual bytes live on the LFS server (GitHub, GitLab, self-hosted) and are fetched on checkout by the LFS smudge filter. A clean filter intercepts new commits and uploads bytes to LFS automatically.
Hands-on Example
Install the client, initialize, and track a pattern. The example versions PNG assets and a model checkpoint.
# One-time install (macOS shown)
brew install git-lfs
git lfs install # registers smudge/clean filters globally
# Inside an existing repo
cd my-project
git lfs track "*.psd"
git lfs track "assets/**/*.png"
git lfs track "models/*.bin"
# Tracking writes patterns to .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text
# assets/**/*.png filter=lfs diff=lfs merge=lfs -text
# models/*.bin filter=lfs diff=lfs merge=lfs -text
git add .gitattributes assets/logo.png models/v1.bin
git commit -m "Track design assets and model in LFS"
git push origin main
To convert files already committed as regular blobs, rewrite history with git lfs migrate:
# Move every .bin in history into LFS, on the current branch
git lfs migrate import --include="*.bin" --include-ref=refs/heads/main
# Verify pointers
git lfs ls-files
# 7d865e959b * models/v1.bin
Without LFS With LFS
----------- ----------------------------
repo.git/ repo.git/
objects/ objects/
[200MB blob] [tiny pointer file]
lfs cache (local)
[200MB blob]
remote LFS store
[200MB blob]
clone size: huge clone size: small
checkout pulls needed LFS objects only CI systems often default to shallow clones that still need LFS objects. For GitHub Actions:
- uses: actions/checkout@v4
with:
lfs: true
To skip LFS download in environments that don’t need binaries (linting only): GIT_LFS_SKIP_SMUDGE=1 git clone ....
Common Pitfalls
- Forgetting
git lfs installon a new machine. Without it, checkouts produce pointer files instead of real binaries. Symptom: your image viewer says “not a valid PNG.” - Tracking after committing. Adding a pattern to
.gitattributesonly affects future commits. Old revisions remain as regular blobs; usegit lfs migrate importto fix history. - Hitting quotas. GitHub LFS has bandwidth and storage limits that apply per account. A noisy CI pulling LFS on every build can blow through quotas in days.
- Force pushes after migration.
git lfs migrate importrewrites history. Coordinate with collaborators or rebase chaos follows. - Submodule + LFS. Submodules don’t inherit LFS configuration from the parent. Each submodule needs its own
git lfs installinvocation.
Practical Tips
- Keep
.gitattributespatterns as narrow as possible. A blanket*tracking rule will move source code into LFS too. - Audit storage with
git lfs ls-files --sizeandgit lfs pruneto clean local cache once objects are safely on the remote. - For CI, cache the
.git/lfsdirectory between runs to avoid re-downloading unchanged binaries. - Use
git lfs lockingfor files that cannot be merged (PSDs, binaries). It coordinates exclusive editing through the server. - Document the LFS setup in
CONTRIBUTING.md. The first failed clone is otherwise a confusing rite of passage.
Wrap-up
Git LFS keeps your repo lean by storing large binaries out of band and replacing them with pointers. Set up tracking before adding binaries, migrate any historical blobs, and configure CI to fetch what it needs. With those basics in place, you get versioned binaries without paying the clone-time tax that pure Git imposes.
Related articles
- Git Git Cherry-pick and Revert Tutorial
Learn how to copy specific commits across branches with cherry-pick and how to safely undo merged changes with revert, including conflict handling and recovery.
- Git Git Rebase vs Merge: When to Use Which
A clear, practical guide to choosing between git rebase and git merge, with safe workflows for feature branches, shared branches, and pull requests.
- Git Git reflog Recovery Tutorial
Use git reflog to recover lost commits, branches, and stashes after rebases, resets, and bad merges. A practical walkthrough of how Git remembers where HEAD has been.
- Git Git Stash Tutorial: Saving Work in Progress
Learn how to use git stash to safely shelve uncommitted changes, switch contexts, and recover work using push, pop, apply, and branch workflows.