Contains Duplicate — The Hash Set Solution
A clear walkthrough of the Contains Duplicate problem, covering brute force, sorting, and the optimal hash set approach with complexity analysis.
What you'll learn
- ✓Why hashing beats sorting here
- ✓How to design a single-pass duplicate check
- ✓When to bail out early
- ✓Trade-offs vs sorting in place
- ✓How to explain the solution in an interview
Prerequisites
- •Comfortable with arrays
- •Basic understanding of hash sets
The Contains Duplicate problem is deceptively simple. Most engineers can solve it in twenty seconds, but interviewers use it as a warm-up to gauge how you reason about time-space trade-offs and how cleanly you write code under no pressure.
The Problem
Given an integer array nums, return true if any value appears at least twice, and false if every element is distinct. The input length can be up to 10^5, so anything quadratic will be too slow for the largest cases. The values themselves can be negative or positive 32-bit integers.
Intuition
The naive approach compares every pair of indices and returns true the moment it finds a match. It is correct, but with two nested loops you do O(n^2) work. For n = 10^5 that is 10^10 comparisons.
A slightly better idea is to sort first. After sorting, duplicates become adjacent, so one linear pass is enough. The catch: sorting is O(n log n) time and either O(log n) or O(n) extra space depending on the algorithm, and it destroys the original order.
The cleanest answer uses a hash set. Walk through nums once. For each value, check membership in O(1) average time. If it is already in the set, return true. Otherwise add it. If you reach the end, every element was unique and you return false. The early-exit behavior often makes this much faster than the asymptotic bound suggests.
Explanation with Example
Take nums = [1, 2, 3, 1]. The set starts empty. We add 1, then 2, then 3. When we reach index 3 we see 1 again. It is already in the set, so we return true.
index: 0 1 2 3
value: 1 2 3 1
set: {1} {1,2} {1,2,3} -> 1 in set -> return True For an all-unique input like [1, 2, 3, 4], every check misses and we add four elements to the set before returning false.
Code
def containsDuplicate(nums):
seen = set()
for x in nums:
if x in seen:
return True
seen.add(x)
return Falseclass Solution {
public boolean containsDuplicate(int[] nums) {
Set<Integer> seen = new HashSet<>();
for (int x : nums) {
if (!seen.add(x)) return true;
}
return false;
}
}class Solution {
public:
bool containsDuplicate(vector<int>& nums) {
unordered_set<int> seen;
for (int x : nums) {
if (!seen.insert(x).second) return true;
}
return false;
}
};Edge Cases
- Empty array: trivially has no duplicates, so the function should return
false. - Single-element array: also returns
false. - Very large arrays: the set could grow to the full input size before you confirm uniqueness.
- Negative numbers: no problem; hash works on any integer.
- If the problem allowed floats, you would need to be careful about NaN, which never compares equal to itself.
Complexity Analysis
Time complexity is O(n) on average. Each lookup and insertion into a hash set is amortized constant time. In the absolute worst case of pathological hash collisions you could get O(n) per operation and O(n^2) overall, but on real input this is essentially impossible.
Space complexity is O(n) because the set could hold up to n distinct elements before we find a duplicate. If the array happens to repeat near the start, we use far less memory in practice.
How to Explain It in an Interview
Lead with the trade-off: brute force is correct but O(n^2). Sorting is O(n log n) and uses no extra hash structure, which is nice if memory is tight. The hash-set solution is O(n) time and O(n) space, and almost always the fastest in wall-clock terms. Mention that you choose hashing because n is up to 10^5 and you want a single pass with early exit. Mentioning short-circuiting unprompted signals that you think about real-world performance, not just asymptotics.
Related Problems
- Contains Duplicate II (with a distance constraint, sliding window of seen values).
- Contains Duplicate III (with both value and index constraints, harder, bucket sort).
- Single Number (XOR trick — covered in another article on this site).
- Two Sum (the same hash-table pattern, but storing index instead of just membership).
Related articles
- Leetcode Majority Element — Boyer-Moore Voting
Solve Majority Element in O(n) time and O(1) space using Boyer-Moore majority vote. Includes intuition, walkthrough, and edge cases.
- Leetcode Missing Number — Sum and XOR Tricks
Solve the Missing Number problem with three approaches: hashing, Gauss sum formula, and XOR. Includes complexity analysis and interview tips.
- Leetcode Move Zeroes — Two-Pointer In-Place Solution
Solve Move Zeroes in O(n) time and O(1) space using a write pointer. Includes walkthrough, edge cases, and interview tips.
- Leetcode Pascal's Triangle — Row-by-Row Construction
Build Pascal's Triangle in O(n^2) time using simple row-by-row addition. Includes a clear diagram, edge cases, and complexity analysis.