Skip to content
C Codeloom
Java

Java Stream Collectors Deep Dive

Master java.util.stream.Collectors with practical examples covering grouping, partitioning, downstream collectors, and building your own custom collector.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • How a Collector is structured
  • Common collectors: toList, toMap, joining
  • Grouping and partitioning with downstream collectors
  • Reducing with custom accumulators
  • Writing a Collector from scratch

Prerequisites

  • Comfort with Java lambdas and streams

What and Why

Stream.collect is the bridge between a lazy pipeline and a concrete result. The Collectors utility class packages dozens of ready-made reductions: collecting to lists and maps, grouping, partitioning, joining strings, summing fields, and more.

Understanding collectors well lets you replace pages of imperative loops with a few declarative lines that often run in parallel without changes.

Mental Model

A Collector<T, A, R> has four moving parts: a supplier that creates a mutable accumulator A, an accumulator that folds elements of type T into it, a combiner that merges two accumulators for parallel runs, and a finisher that turns A into the final result R.

supplier()  --> A (empty container)
accumulator(A, T) --> mutates A with each element
combiner(A, A) --> A (only used in parallel)
finisher(A) --> R (final shape)
Collector lifecycle

You rarely implement all four. Most of the time you compose pre-built collectors.

Hands-on Example

Imagine processing a list of orders.

import java.util.*;
import java.util.stream.*;
import static java.util.stream.Collectors.*;

record Order(String customer, String category, double amount) {}

public class CollectorsDemo {
    public static void main(String[] args) {
        List<Order> orders = List.of(
            new Order("ada", "books", 12.0),
            new Order("ada", "books", 8.5),
            new Order("bob", "toys", 30.0),
            new Order("bob", "books", 15.0),
            new Order("cleo", "toys", 50.0)
        );

        // Group totals per customer
        Map<String, Double> totalByCustomer = orders.stream()
            .collect(groupingBy(Order::customer, summingDouble(Order::amount)));

        // Count orders per category
        Map<String, Long> countByCategory = orders.stream()
            .collect(groupingBy(Order::category, counting()));

        // Partition by big spenders
        Map<Boolean, List<Order>> big = orders.stream()
            .collect(partitioningBy(o -> o.amount() > 20));

        // Nested grouping
        Map<String, Map<String, Double>> nested = orders.stream()
            .collect(groupingBy(Order::customer,
                     groupingBy(Order::category,
                     summingDouble(Order::amount))));

        // Join a column
        String customers = orders.stream()
            .map(Order::customer).distinct()
            .collect(joining(", ", "[", "]"));

        System.out.println(totalByCustomer);
        System.out.println(nested);
        System.out.println(customers);
    }
}

The most powerful pattern is the downstream collector passed to groupingBy. Anywhere you would otherwise call groupingBy(...) and then post-process the values, you can fuse the work into a single pass.

Common Pitfalls

  • toMap and duplicate keys: the two-argument form throws IllegalStateException on collision. Use the three-arg form with a merge function: toMap(k, v, (a, b) -> a).
  • Null values in toMap: maps returned by toMap are not guaranteed to accept null values. Wrap in Optional or filter nulls first.
  • Mutability assumptions: Collectors.toList() historically returned an ArrayList, but the contract only says “some List”. Use toCollection(ArrayList::new) if you really need a specific type, or toUnmodifiableList() if you want immutability.
  • Parallel without an associative reduction: collectors run safely in parallel only when the combine step really merges two partial accumulators. Custom collectors must respect this.

Practical Tips

Use mapping to flatten transformations into downstream collectors, for example groupingBy(Order::customer, mapping(Order::category, toSet())) to get each customer’s unique categories.

Reach for teeing (Java 12+) when you need to compute two things in one pass, like average and max together, and combine them at the end.

When the built-ins don’t fit, implement Collector directly with Collector.of(supplier, accumulator, combiner, finisher). Here is a tiny custom collector that returns the first and last elements seen.

Collector<Order, ?, List<Order>> firstAndLast = Collector.of(
    () -> new ArrayList<Order>(2),
    (acc, o) -> {
        if (acc.isEmpty()) acc.add(o);
        if (acc.size() == 2) acc.set(1, o); else acc.add(o);
    },
    (a, b) -> { a.add(b.get(b.size() - 1)); return a; }
);

Prefer toUnmodifiableList and toUnmodifiableMap when handing results across module boundaries. They guard against accidental writes.

Wrap-up

Collectors are tiny composable strategies. Once you internalize the supplier/accumulator/combiner/finisher shape, you can read the Collectors Javadoc as a recipe book and confidently compose new behaviors. The result is concise pipelines that scale from single-threaded experiments to parallel batch jobs.