Skip to content
C Codeloom
Leetcode

Contains Duplicate — The Hash Set Solution

A clear walkthrough of the Contains Duplicate problem, covering brute force, sorting, and the optimal hash set approach with complexity analysis.

·4 min read · By Codeloom
Beginner 7 min read

What you'll learn

  • Why hashing beats sorting here
  • How to design a single-pass duplicate check
  • When to bail out early
  • Trade-offs vs sorting in place
  • How to explain the solution in an interview

Prerequisites

  • Comfortable with arrays
  • Basic understanding of hash sets

The Contains Duplicate problem is deceptively simple. Most engineers can solve it in twenty seconds, but interviewers use it as a warm-up to gauge how you reason about time-space trade-offs and how cleanly you write code under no pressure.

The Problem

Given an integer array nums, return true if any value appears at least twice, and false if every element is distinct. The input length can be up to 10^5, so anything quadratic will be too slow for the largest cases. The values themselves can be negative or positive 32-bit integers.

Intuition

The naive approach compares every pair of indices and returns true the moment it finds a match. It is correct, but with two nested loops you do O(n^2) work. For n = 10^5 that is 10^10 comparisons.

A slightly better idea is to sort first. After sorting, duplicates become adjacent, so one linear pass is enough. The catch: sorting is O(n log n) time and either O(log n) or O(n) extra space depending on the algorithm, and it destroys the original order.

The cleanest answer uses a hash set. Walk through nums once. For each value, check membership in O(1) average time. If it is already in the set, return true. Otherwise add it. If you reach the end, every element was unique and you return false. The early-exit behavior often makes this much faster than the asymptotic bound suggests.

Explanation with Example

Take nums = [1, 2, 3, 1]. The set starts empty. We add 1, then 2, then 3. When we reach index 3 we see 1 again. It is already in the set, so we return true.

index:  0    1    2    3
value:  1    2    3    1
set:   {1} {1,2} {1,2,3}  -> 1 in set -> return True
Hash set fills until a repeat is found

For an all-unique input like [1, 2, 3, 4], every check misses and we add four elements to the set before returning false.

Code

def containsDuplicate(nums):
    seen = set()
    for x in nums:
        if x in seen:
            return True
        seen.add(x)
    return False
class Solution {
    public boolean containsDuplicate(int[] nums) {
        Set<Integer> seen = new HashSet<>();
        for (int x : nums) {
            if (!seen.add(x)) return true;
        }
        return false;
    }
}
class Solution {
public:
    bool containsDuplicate(vector<int>& nums) {
        unordered_set<int> seen;
        for (int x : nums) {
            if (!seen.insert(x).second) return true;
        }
        return false;
    }
};

Edge Cases

  • Empty array: trivially has no duplicates, so the function should return false.
  • Single-element array: also returns false.
  • Very large arrays: the set could grow to the full input size before you confirm uniqueness.
  • Negative numbers: no problem; hash works on any integer.
  • If the problem allowed floats, you would need to be careful about NaN, which never compares equal to itself.

Complexity Analysis

Time complexity is O(n) on average. Each lookup and insertion into a hash set is amortized constant time. In the absolute worst case of pathological hash collisions you could get O(n) per operation and O(n^2) overall, but on real input this is essentially impossible.

Space complexity is O(n) because the set could hold up to n distinct elements before we find a duplicate. If the array happens to repeat near the start, we use far less memory in practice.

How to Explain It in an Interview

Lead with the trade-off: brute force is correct but O(n^2). Sorting is O(n log n) and uses no extra hash structure, which is nice if memory is tight. The hash-set solution is O(n) time and O(n) space, and almost always the fastest in wall-clock terms. Mention that you choose hashing because n is up to 10^5 and you want a single pass with early exit. Mentioning short-circuiting unprompted signals that you think about real-world performance, not just asymptotics.

  • Contains Duplicate II (with a distance constraint, sliding window of seen values).
  • Contains Duplicate III (with both value and index constraints, harder, bucket sort).
  • Single Number (XOR trick — covered in another article on this site).
  • Two Sum (the same hash-table pattern, but storing index instead of just membership).