LeetCode Group Anagrams: Hash Map of Sorted Keys

Intermediate 10 min read

What you'll learn

✓Why the brute-force pairwise comparison fails on large inputs
✓How to choose a canonical key so anagrams hash to the same bucket
✓The tradeoff between sorted-string keys and count-tuple keys
✓Edge cases like empty strings and very long words
✓A confident interview talk-track for the problem

Prerequisites

•Comfort with hash maps and dictionaries
•Basic string operations in Python

LeetCode 49, Group Anagrams, is a Medium problem that comes up constantly in onsite loops. It is a great test of whether you can spot the canonicalization pattern: turn each string into a key that all of its anagrams share, then group them with a hash map. We will walk through the brute-force baseline, two optimal key choices, the complexity of each, and how to discuss the tradeoffs in an interview.

The Problem

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Constraints typically say up to 10^4 strings, each up to 100 lowercase English letters.

Brute Force

The naive approach checks each pair of strings to see if they are anagrams. For each new string, scan the existing groups and join one if a representative is an anagram.

from collections import Counter

def groupAnagrams(strs):
    groups = []
    for s in strs:
        placed = False
        for g in groups:
            if Counter(s) == Counter(g[0]):
                g.append(s)
                placed = True
                break
        if not placed:
            groups.append([s])
    return groups

This is O(n^2 * k) in the worst case, where n is the number of strings and k is the maximum length. With 10^4 inputs and 100 characters each, that is 10^10 work. It will time out.

Optimal Solution

The trick is to compute a canonical key for each string. Two strings are anagrams if and only if their canonical keys match. We can use a hash map from key to list, dropping each string into the right bucket in one pass.

Sorted string key

from collections import defaultdict

def groupAnagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = ''.join(sorted(s))
        groups[key].append(s)
    return list(groups.values())

Sorting each string costs O(k log k). Total time is O(n * k log k).

Character count key

If k is large or the alphabet is small, a count tuple is faster.

from collections import defaultdict

def groupAnagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        counts = [0] * 26
        for ch in s:
            counts[ord(ch) - ord('a')] += 1
        groups[tuple(counts)].append(s)
    return list(groups.values())

Building the count is O(k). Total time is O(n * k). The tradeoff is constant overhead from creating a 26-length tuple for each string, which is sometimes slower in practice for short inputs but always better asymptotically.

Walk Through an Example

Take strs = ["eat","tea","tan","ate","nat","bat"] with the sorted-key version. We compute keys:

eat becomes aet
tea becomes aet
tan becomes ant
ate becomes aet
nat becomes ant
bat becomes abt

Three keys appear: aet, ant, and abt. The map ends up as:

aet: [eat, tea, ate]
ant: [tan, nat]
abt: [bat]

We return the values as a list of lists. The order inside each group reflects the input order, which is acceptable because the problem allows any ordering.

Edge Cases

Empty input: return an empty list. The loop runs zero times.
Empty strings: every empty string is an anagram of every other empty string. Both keys are the empty key, so they end up in one group.
Very long strings: count keys are faster than sorted keys when k grows, because O(k) beats O(k log k).
Unicode follow-up: replace the fixed 26 array with a frozen Counter. For example, tuple(sorted(Counter(s).items())).
Case sensitivity: by default A and a are different. Normalize if the problem says otherwise.

Complexity Analysis

Let n be the number of strings and k be the maximum length.

Brute force: O(n^2 * k) time, O(n * k) space.
Sorted-key hash map: O(n * k log k) time, O(n * k) space for the map and output.
Count-key hash map: O(n * k) time, O(n * k) space.

The space is dominated by the output itself. A quick recap of these notations is in Big O notation explained.

How to Explain It in an Interview

Start by recognizing that you need to bucket anagrams, which means equivalent strings need the same key.
Mention the pairwise comparison baseline and dismiss it as O(n^2 * k).
Propose the sorted-string key first because it is the shortest code.
Offer the count-tuple key as an O(n * k) improvement and explain why it matters for long strings.
Mention the Unicode follow-up so the interviewer sees you have thought beyond the stated constraints.

The progression mirrors the Valid Anagram talk-track: sort first, count second. That consistency is something interviewers notice.

LeetCode 242, Valid Anagram, which is the single-pair version of this question.
LeetCode 438, Find All Anagrams in a String, which uses the same count idea inside a sliding window.
LeetCode 567, Permutation in String, which is another count-matching variant.

For a refresher on the data structure powering the grouping, see hashing and hash maps. For deeper string pattern context, our note on pattern matching basics is a good follow-up.

Wrap up

Group Anagrams is a textbook example of the canonical-key pattern. Pick a representation that collapses equivalent inputs to the same value, then let a hash map do the grouping work in linear time. The sorted-string key is the fastest to write, the count tuple is asymptotically better, and being able to switch between them on demand makes you look fluent rather than memorized.