What is Git? A Complete Introduction for Beginners

Beginner 9 min read

What you'll learn

✓What Git is and the problem it was built to solve
✓How Git differs from GitHub, GitLab, and Bitbucket
✓The core Git vocabulary: repository, commit, branch, remote
✓Why Git is described as a distributed version control system
✓A mental model of the three areas: working tree, staging, repository

Prerequisites

•A computer running Windows, macOS, or Linux
•Comfort opening a terminal — no prior version control experience needed

Git is the version control system used by virtually every professional software project today. Linux, the Linux kernel, Chromium, VS Code, React, Django, and the source for nearly every package you have ever installed — all are tracked with Git. Learning Git is not optional for a modern developer; it is part of the baseline literacy of the craft.

This first post does not type a single Git command. The goal here is the mental model. Once that is solid, the commands in the next four posts will fall into place rather than feel arbitrary.

The problem Git solves

Imagine you are writing a long document — an essay, a contract, a piece of software. You make a change, and you want to keep the old version “just in case.” The naive solution is to copy the file:

report.txt
report-v2.txt
report-final.txt
report-final-actually.txt
report-final-actually-FIXED.txt

Every developer has done a version of this. It works for a single file edited by a single person. It collapses immediately when:

The project contains hundreds of files.
Two people edit the same file at the same time.
You need to know who changed what and why, six months later.
You want to try a risky change without disturbing the working version.

A version control system (VCS) solves all of these. It records every change to every file, attaches an author and a message to each change, and lets you switch between versions, compare them, and merge work from many people without losing history.

Git is the most widely used VCS by a very wide margin. It was created by Linus Torvalds in 2005 to manage the Linux kernel — a project with thousands of contributors scattered across the world — after the previous tool the kernel used became unavailable. The constraints of that project shaped Git’s design: it had to be fast, fully distributed, and robust against corruption.

Distributed version control

Older systems like CVS and Subversion are centralized: there is one server that holds the project’s history, and every developer connects to it to commit, view history, or fetch changes. Lose the server, lose the history.

Git is distributed. Every copy of a Git repository contains the entire history of the project. When you clone a repository, you do not just download the latest files — you download every commit, every branch, every tag, all the way back to the first commit. You can work offline, view history, create branches, and commit, all without a network connection. When you reconnect, you synchronise with others.

This property is the single most important thing to internalise about Git. The history lives on your machine, not on a server somewhere. GitHub is convenient, but it is not Git — Git would work perfectly well if GitHub disappeared tomorrow.

Git vs. GitHub

This distinction trips up almost every beginner, so it is worth stating plainly:

Git is a program you install on your computer. It tracks changes to files. It is free and open source. It runs entirely on your machine.
GitHub is a website (owned by Microsoft) that hosts Git repositories online so people can collaborate on them. GitLab and Bitbucket are competing services with the same idea.

You can use Git without ever creating a GitHub account — many private projects do exactly that. You cannot use GitHub without Git, because the underlying technology GitHub stores and serves is a Git repository.

A useful analogy: Git is to GitHub what email is to Gmail. Email is a protocol; Gmail is one company’s website that happens to speak that protocol.

The core vocabulary

Five words show up everywhere in Git. Get comfortable with them now, even before running any commands.

Repository (or “repo”). A folder whose contents Git is tracking. Technically, it is the folder plus a hidden .git subfolder where Git stores all the history and metadata. You can have as many repositories on your machine as you like — each project gets its own.

Commit. A saved snapshot of the project at a single point in time. Each commit has a unique identifier (a 40-character hash like a1b2c3d...), an author, a date, and a short message describing what changed. The history of a Git project is a chain of commits, each pointing to its parent.

Branch. A movable pointer to a particular commit. Branches let you work on multiple things in parallel — a new feature, a bug fix, an experiment — without those changes interfering with each other. Every repository starts with one branch (traditionally master, now usually main).

Remote. A version of your repository hosted somewhere else — usually on GitHub or a similar service. The remote named origin is the default name for the place you cloned from, but you can have multiple remotes and name them anything.

Working tree. The actual files in your folder — the ones you see in your editor and modify as you work. Git compares the working tree to its internal history to figure out what has changed.

You will see all five of these in the next post when we run our first commands. For now, just file them away.

The three areas

Git’s defining mental model is the three areas a file can live in. This is genuinely the thing to understand.

The working tree — the files on disk you edit.
The staging area (also called the “index”) — a holding pen for changes you have decided to include in the next commit.
The repository — the permanent history of every commit ever made.

When you edit a file, the change exists only in the working tree. To save it into history you do two things:

git add moves the change from the working tree to the staging area.
git commit moves everything currently staged from the staging area into the repository as a new commit.

working tree   --(git add)-->   staging area   --(git commit)-->   repository

Why two steps? Because real work is messy. You might fix three unrelated things in a single editing session — a typo in the README, a bug in a function, and a refactor of an import. The staging area lets you commit these separately, with three clear messages, instead of bundling them into a single confusing commit. This deliberate, well-described history is one of the main reasons professionals choose Git.

Try it yourself — no terminal yet. Open a folder you have been working in. Mentally classify the state of each file:

Which files have you edited today but not “saved” into version history?
Which files are finished and ready to record as a unit?
Which files have been the same for weeks?

In Git terms these are: working tree changes, things you would add and commit, and the existing repository. The model is more familiar than it sounds.

Snapshots, not differences

One implementation detail worth knowing because it shapes how Git behaves: Git stores snapshots, not differences.

Older systems like Subversion stored each commit as the delta from the previous one — “lines 4 through 7 changed; here is the new text.” Git instead stores the entire state of every tracked file at every commit. If a file did not change between two commits, Git stores a pointer to the previous version rather than a new copy, so this is not as wasteful as it sounds.

The practical consequence: operations like switching branches, viewing old versions, and comparing commits are extremely fast in Git. They are mostly local lookups against snapshots Git already has, not network calls or expensive delta calculations.

Why Git is worth learning properly

Once you understand the three-area model, the distributed-history idea, and the vocabulary, Git is one of the highest-leverage skills you can pick up.

Every job posting for a software role assumes it.
It removes the fear of “breaking” your code — there is always a recent commit to fall back to.
It is the only practical way to collaborate on code with other humans.
It enables open-source contribution, which is the fastest accelerant to a career as a developer.

Most beginners learn just enough Git to push code to GitHub and stop there. The cost is years of mild confusion and the occasional catastrophic mistake. A few hours spent now on the fundamentals — exactly what this five-post series covers — pays back every week for the rest of your career.

Try it yourself — sketch the model. On a piece of paper, draw three boxes labelled “working tree”, “staging area”, and “repository”. Draw arrows between them labelled git add and git commit. To the right of “repository”, draw a fourth box labelled “remote (GitHub)” and an arrow labelled git push. You have just drawn 90% of what Git does day to day.

Recap

You now know:

Git is a distributed version control system — every clone contains the entire history.
GitHub is a hosting service for Git repositories — useful, but not Git itself.
The core vocabulary is repository, commit, branch, remote, working tree.
Git’s mental model is three areas — working tree, staging area, repository — connected by git add and git commit.
Git stores snapshots, not differences, which is why most operations are fast and local.

Next steps

The next post is the hands-on counterpart to this one: installing Git on your operating system, configuring your name and email, and walking through the full life cycle of your very first commit.

→ Next: Install Git and Make Your First Commit

Questions or feedback? Email codeloomdevv@gmail.com.