LeetCode Group Anagrams: Hash Map of Sorted Keys
Solve LeetCode Group Anagrams cleanly with a hash map keyed by sorted strings or character counts. Includes complexity analysis and interview talking points.
What you'll learn
- ✓Why the brute-force pairwise comparison fails on large inputs
- ✓How to choose a canonical key so anagrams hash to the same bucket
- ✓The tradeoff between sorted-string keys and count-tuple keys
- ✓Edge cases like empty strings and very long words
- ✓A confident interview talk-track for the problem
Prerequisites
- •Comfort with hash maps and dictionaries
- •Basic string operations in Python
LeetCode 49, Group Anagrams, is a Medium problem that comes up constantly in onsite loops. It is a great test of whether you can spot the canonicalization pattern: turn each string into a key that all of its anagrams share, then group them with a hash map. We will walk through the brute-force baseline, two optimal key choices, the complexity of each, and how to discuss the tradeoffs in an interview.
The Problem
Given an array of strings strs, group the anagrams together. You can return the answer in any order.
Example:
- Input:
strs = ["eat","tea","tan","ate","nat","bat"] - Output:
[["bat"],["nat","tan"],["ate","eat","tea"]]
Constraints typically say up to 10^4 strings, each up to 100 lowercase English letters.
Brute Force
The naive approach checks each pair of strings to see if they are anagrams. For each new string, scan the existing groups and join one if a representative is an anagram.
from collections import Counter
def groupAnagrams(strs):
groups = []
for s in strs:
placed = False
for g in groups:
if Counter(s) == Counter(g[0]):
g.append(s)
placed = True
break
if not placed:
groups.append([s])
return groups
This is O(n^2 * k) in the worst case, where n is the number of strings and k is the maximum length. With 10^4 inputs and 100 characters each, that is 10^10 work. It will time out.
Optimal Solution
The trick is to compute a canonical key for each string. Two strings are anagrams if and only if their canonical keys match. We can use a hash map from key to list, dropping each string into the right bucket in one pass.
Sorted string key
from collections import defaultdict
def groupAnagrams(strs):
groups = defaultdict(list)
for s in strs:
key = ''.join(sorted(s))
groups[key].append(s)
return list(groups.values())
Sorting each string costs O(k log k). Total time is O(n * k log k).
Character count key
If k is large or the alphabet is small, a count tuple is faster.
from collections import defaultdict
def groupAnagrams(strs):
groups = defaultdict(list)
for s in strs:
counts = [0] * 26
for ch in s:
counts[ord(ch) - ord('a')] += 1
groups[tuple(counts)].append(s)
return list(groups.values())
Building the count is O(k). Total time is O(n * k). The tradeoff is constant overhead from creating a 26-length tuple for each string, which is sometimes slower in practice for short inputs but always better asymptotically.
Walk Through an Example
Take strs = ["eat","tea","tan","ate","nat","bat"] with the sorted-key version. We compute keys:
eatbecomesaetteabecomesaettanbecomesantatebecomesaetnatbecomesantbatbecomesabt
Three keys appear: aet, ant, and abt. The map ends up as:
aet:[eat, tea, ate]ant:[tan, nat]abt:[bat]
We return the values as a list of lists. The order inside each group reflects the input order, which is acceptable because the problem allows any ordering.
Edge Cases
- Empty input: return an empty list. The loop runs zero times.
- Empty strings: every empty string is an anagram of every other empty string. Both keys are the empty key, so they end up in one group.
- Very long strings: count keys are faster than sorted keys when
kgrows, becauseO(k)beatsO(k log k). - Unicode follow-up: replace the fixed 26 array with a frozen
Counter. For example,tuple(sorted(Counter(s).items())). - Case sensitivity: by default
Aandaare different. Normalize if the problem says otherwise.
Complexity Analysis
Let n be the number of strings and k be the maximum length.
- Brute force:
O(n^2 * k)time,O(n * k)space. - Sorted-key hash map:
O(n * k log k)time,O(n * k)space for the map and output. - Count-key hash map:
O(n * k)time,O(n * k)space.
The space is dominated by the output itself. A quick recap of these notations is in Big O notation explained.
How to Explain It in an Interview
- Start by recognizing that you need to bucket anagrams, which means equivalent strings need the same key.
- Mention the pairwise comparison baseline and dismiss it as
O(n^2 * k). - Propose the sorted-string key first because it is the shortest code.
- Offer the count-tuple key as an
O(n * k)improvement and explain why it matters for long strings. - Mention the Unicode follow-up so the interviewer sees you have thought beyond the stated constraints.
The progression mirrors the Valid Anagram talk-track: sort first, count second. That consistency is something interviewers notice.
Related Problems
- LeetCode 242, Valid Anagram, which is the single-pair version of this question.
- LeetCode 438, Find All Anagrams in a String, which uses the same count idea inside a sliding window.
- LeetCode 567, Permutation in String, which is another count-matching variant.
For a refresher on the data structure powering the grouping, see hashing and hash maps. For deeper string pattern context, our note on pattern matching basics is a good follow-up.
Wrap up
Group Anagrams is a textbook example of the canonical-key pattern. Pick a representation that collapses equivalent inputs to the same value, then let a hash map do the grouping work in linear time. The sorted-string key is the fastest to write, the count tuple is asymptotically better, and being able to switch between them on demand makes you look fluent rather than memorized.