Pandas Pivot Tables Tutorial: Reshape Data Like a Spreadsheet Pro
Learn how to use pandas pivot_table to summarize, reshape, and aggregate data with multiple indexes, columns, and custom aggregation functions in real workflows.
What you'll learn
- ✓How pivot_table differs from pivot and groupby
- ✓Using multiple index and column levels
- ✓Applying custom aggregation functions
- ✓Handling missing values and margins
- ✓Reshaping pivot output back to long form
Prerequisites
- •Familiar with Python
What and Why
Pivot tables are one of the most powerful tools in pandas for turning long, transactional data into compact, readable summaries. If you have ever used pivot tables in Excel, the pandas equivalent does the same job but at scale and with code you can version-control.
A pivot table answers questions like “what is the average sales per region per quarter” or “how many orders did each customer place per product category.” Instead of writing nested loops, you describe the shape you want and pandas handles the aggregation.
Mental Model
Think of pivot_table as a three-axis machine: rows (index), columns (columns), and cells (values aggregated by aggfunc). Every unique combination of index and column values produces one cell.
Long format Pivot table
+--------+-------+-----+ +---------+-----+-----+
| region | qtr | amt | | region | Q1 | Q2 |
+--------+-------+-----+ -> +---------+-----+-----+
| West | Q1 | 100 | | West | 250 | 180 |
| West | Q1 | 150 | | East | 400 | 320 |
| West | Q2 | 180 | +---------+-----+-----+
| East | Q1 | 400 | index=region, columns=qtr
| East | Q2 | 320 | values=amt, aggfunc=sum
+--------+-------+-----+ The difference between pivot and pivot_table is critical: pivot only reshapes and fails on duplicates, while pivot_table aggregates duplicates using aggfunc (default mean).
Hands-on Example
Let us build a sales dataset and pivot it several ways.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'region': ['West','West','East','East','West','East','South','South'],
'quarter': ['Q1','Q2','Q1','Q2','Q1','Q1','Q2','Q1'],
'product': ['A','A','B','A','B','A','B','A'],
'amount': [100, 180, 400, 320, 150, 220, 90, 60],
'units': [10, 18, 40, 32, 15, 22, 9, 6],
})
table = pd.pivot_table(
df,
index='region',
columns='quarter',
values='amount',
aggfunc='sum',
fill_value=0,
)
print(table)
This produces a region-by-quarter grid of summed amounts. To pivot on multiple values and functions at once:
multi = pd.pivot_table(
df,
index=['region','product'],
columns='quarter',
values=['amount','units'],
aggfunc={'amount':'sum','units':'mean'},
fill_value=0,
margins=True,
margins_name='Total',
)
print(multi)
The margins=True flag adds row and column totals, useful for executive summaries. The result is a MultiIndex on both rows and columns, which you can flatten:
multi.columns = ['_'.join(map(str, c)).strip('_') for c in multi.columns]
multi = multi.reset_index()
To go back to long form, use melt:
long_again = table.reset_index().melt(
id_vars='region',
var_name='quarter',
value_name='amount',
)
This round-trip is the essence of tidy data work.
Common Pitfalls
The first pitfall is forgetting that aggfunc defaults to mean. If you expect counts and see decimals, switch to 'sum' or 'count'. The second is mixing string and numeric types in values; pandas will silently coerce or drop, depending on the function.
A third trap is NaN everywhere. When a region has no data for a quarter, the cell becomes NaN. Use fill_value=0 only when zero is semantically correct; for averages or medians, leaving NaN is often safer.
Finally, beware of pivot versus pivot_table. If you pass duplicates to pivot, you get ValueError: Index contains duplicate entries. Switch to pivot_table whenever the same index/column pair can appear twice.
Practical Tips
Pass a list to aggfunc to compute several statistics at once, for example aggfunc=['sum','mean','count']. This is much faster than running three separate pivots.
For large frames, downcast types before pivoting. Converting float64 to float32 and using categorical dtypes for index and columns reduces memory and speeds aggregation noticeably.
When you only need one row-and-column statistic, prefer groupby(...).agg(...).unstack(). It runs slightly faster and gives you more control over column ordering than pivot_table.
If you plan to feed the pivoted result into a chart, set observed=True when the index is categorical. Otherwise pandas emits every category combination, including ones with no data.
Wrap-up
pivot_table is the Swiss Army knife for reshaping pandas data. Master the three axes, choose your aggfunc deliberately, and remember that pivoting is reversible with melt. Once these patterns become automatic, you will reach for them whenever raw transactional data needs to become an answer.
Try it on a CSV you already have: pick two categorical columns, one numeric, and pivot. The summary you get back in three lines of code would take many clicks in a spreadsheet, and it scales to millions of rows.
Related articles
- Pandas Pandas MultiIndex Tutorial
A practical guide to Pandas MultiIndex: when to use it, how it really works, and the slicing, stacking, and groupby patterns that make hierarchical data manageable.
- Pandas Pandas Performance Optimization Tips: Make DataFrames 10x Faster
Practical pandas speedups: vectorization, dtype tuning, categorical columns, eval/query, and chunked I/O patterns that turn slow scripts into responsive pipelines.
- Pandas Pandas groupby, merge, and concat
A practical guide to combining and summarising DataFrames — groupby with named aggregations, multi-column aggregates, the four merge styles, and stacking with concat.
- Pandas Pandas DataFrames: Reading, Selecting, and Filtering
A practical guide to the daily DataFrame moves — read_csv and read_json, head and info, column selection, loc vs iloc, boolean filtering, sorting, and value_counts.