Skip to content
C Codeloom
Python

Python Sets and Set Operations

Learn how Python sets store unique values, the operations they support — union, intersection, difference — and when to choose a set over a list or dictionary.

·7 min read · By Yash Kesharwani
Beginner 7 min read

What you'll learn

  • How to create sets and what makes them different from lists
  • Adding, removing, and testing membership
  • The four core set operations: union, intersection, difference, symmetric difference
  • When to reach for a set instead of a list or dictionary
  • The frozenset — an immutable set you can use as a dictionary key

Prerequisites

A set is an unordered collection of unique values. Two properties define it: duplicates are not stored, and membership tests are extremely fast regardless of size. Those two properties make sets the right tool for a surprising number of everyday problems.

Creating a set

The two ways to create a non-empty set:

# Literal — comma-separated values in braces
tags = {"python", "tutorial", "beginner"}

# Constructor — from any iterable
tags = set(["python", "tutorial", "beginner", "python"])
print(tags)    # {'python', 'tutorial', 'beginner'}

Notice that the duplicate "python" is silently dropped. This is the defining feature of a set.

Empty sets need set(), not {}. Empty braces create an empty dictionary. Use set() for an empty set.

empty_set = set()       # correct
empty_dict = {}         # NOT an empty set

Unordered

Sets do not preserve insertion order, and the printed order may surprise you:

print({"c", "a", "b"})   # {'b', 'a', 'c'} or similar

If order matters, use a list. If uniqueness matters and order doesn’t, use a set.

Adding and removing

tags = {"python", "tutorial"}

tags.add("beginner")              # add one element
print(tags)                       # {'python', 'tutorial', 'beginner'}

tags.update(["intermediate", "advanced"])   # add multiple
print(tags)

tags.remove("tutorial")           # remove — raises KeyError if missing
tags.discard("javascript")        # remove — silently ignores if missing

popped = tags.pop()               # remove & return an arbitrary element
print(popped)

tags.clear()                      # remove everything
print(tags)                       # set()

The distinction between remove and discard is the only one worth memorising — use discard when “not there” is acceptable.

Membership testing

Like dictionary keys, in on a set is fast — constant time regardless of size. This is why sets are the right answer when you need to repeatedly ask “does this collection contain X?”:

allowed = {"admin", "editor", "viewer"}
role = "editor"
if role in allowed:
    print("Access granted")

The same check on a long list would scan from the start every time. For thousands of items the difference is dramatic.

Iteration and size

tags = {"python", "tutorial", "beginner"}
print(len(tags))           # 3

for tag in tags:
    print(tag)

Iteration order is not guaranteed — never rely on it for logic.

The four set operations

The strength of sets is the algebra of operations between them. Each operation comes in two forms — an operator and a method. The operator form requires both sides to be sets; the method form accepts any iterable on the right.

Union — everything from both

a = {1, 2, 3}
b = {3, 4, 5}

print(a | b)               # {1, 2, 3, 4, 5}
print(a.union(b))          # same

Intersection — only what is in both

print(a & b)               # {3}
print(a.intersection(b))   # same

Difference — in the first but not the second

print(a - b)               # {1, 2}
print(a.difference(b))     # same

Symmetric difference — in either but not both

print(a ^ b)                       # {1, 2, 4, 5}
print(a.symmetric_difference(b))   # same

In-place variants

Add = (or use _update methods) to modify the left set in place:

a |= b      # union update
a &= b      # intersection update
a -= b      # difference update
a ^= b      # symmetric difference update

Try it yourself. Given:

python_devs = {"Alice", "Bob", "Carol", "Dave"}
js_devs = {"Bob", "Eve", "Frank", "Carol"}

Find:

  1. Developers who know both languages
  2. Developers who know Python but not JavaScript
  3. Developers who know exactly one of the two

Express each using a set operator.

Subset and superset tests

Sets compare against each other with familiar comparison operators:

small = {1, 2}
big = {1, 2, 3, 4}

print(small <= big)        # True — small is a subset
print(big >= small)        # True — big is a superset
print(small < big)         # True — proper subset (not equal)

a.isdisjoint(b) returns True if the two sets share no elements.

Set comprehensions

Like list and dictionary comprehensions:

words = ["apple", "banana", "apricot", "blueberry"]
initials = {w[0] for w in words}
print(initials)            # {'a', 'b'}

Set comprehensions are ideal when you want a list-comprehension-style transformation with deduplication built in.

Common practical uses

A few patterns where sets are clearly the right tool.

1. Deduplicate a list

items = ["apple", "banana", "apple", "cherry", "banana"]
unique = list(set(items))
print(unique)              # ['apple', 'banana', 'cherry']  — order may differ

If order matters, use dict.fromkeys() instead, which preserves insertion order:

unique_ordered = list(dict.fromkeys(items))
print(unique_ordered)      # ['apple', 'banana', 'cherry']

2. Find common or unique elements

this_month = {"alice", "bob", "carol"}
last_month = {"alice", "carol", "dave"}

returning = this_month & last_month       # active both months
new = this_month - last_month             # came back this month
churned = last_month - this_month         # didn't return

3. Fast filtering

allowed_ids = {1, 5, 9, 13, 17}
events = [
    {"id": 5, "action": "login"},
    {"id": 9, "action": "purchase"},
    {"id": 99, "action": "spam"},
]
filtered = [e for e in events if e["id"] in allowed_ids]

Replacing the right-hand in check with a list would slow this down dramatically as allowed_ids grows.

Try it yourself. Take the string "abracadabra", convert it to a set, and confirm the result is {'a', 'b', 'r', 'c', 'd'}. Then use len() to find the number of distinct characters in one line.

frozenset — an immutable set

set is mutable, which means it cannot be used as a dictionary key or as a member of another set. When you need an immutable version, use frozenset:

fs = frozenset(["a", "b", "c"])
groups = {
    frozenset(["alice", "bob"]): "team-a",
    frozenset(["carol", "dave"]): "team-b",
}

You rarely need this — but when you do, it is the only way to make it work.

When to use a set

Reach for a set when:

  • You need to deduplicate a collection
  • You need fast membership testing on a large group
  • You are computing unions, intersections, or differences between collections
  • The order of items genuinely does not matter

Stick with a list when order matters. Use a dictionary when you need to associate each item with a value.

Recap

You now know:

  • Sets store unique, unordered values, created with {} or set()
  • add, update, remove, discard, pop, clear cover mutation
  • in is fast — sets are ideal for “does X belong?” tests
  • The four operations: union |, intersection &, difference -, symmetric difference ^
  • Set comprehensions and dict.fromkeys give you deduplication tools at different cost levels
  • frozenset is the immutable variant, suitable for use as a dictionary key

Next steps

The next post is the natural closer for this section: type conversion — how to safely move values between types when reading user input, parsing data, and combining numbers with strings.

→ Next: Type Conversion (Casting) in Python

Questions or feedback? Email codeloomdevv@gmail.com.