Pandas Time Series Analysis Tutorial
Work with datetime indexes, resampling, rolling windows, lag features, and timezone gotchas to analyze time series cleanly in pandas.
What you'll learn
- ✓Why a DatetimeIndex unlocks the best pandas API surface
- ✓Resampling between frequencies with aggregations
- ✓Rolling and expanding windows for smoothing and stats
- ✓Building lag and lead features for forecasting
- ✓Timezone and DST pitfalls that bite in production
Prerequisites
- •Familiar with how APIs work
What and Why
Time series data is everywhere: web traffic, sales, sensor readings, server metrics. Pandas has rich support for it, but only if your dataframe has a proper DatetimeIndex. Once that is in place, resampling, rolling windows, and date-based slicing become one-liners.
Get the index right and pandas does almost all the work. Get it wrong and you fight off-by-one and timezone bugs forever.
Mental Model
Three operations cover most time-series work:
- Resampling: change the frequency (daily to weekly, ticks to minutes).
- Rolling windows: compute a function over the last N points at each step.
- Shifting: align past or future values into the current row to build features.
raw points (irregular timestamps)
|
v resample('D').sum() -> regular daily series
|
v rolling(window=7).mean() -> smoothed 7-day average
|
v shift(1) -> previous day's value as a feature These three composed together form the backbone of dashboards, monitoring, and most forecasting feature pipelines.
Hands-on Example
Start by parsing dates as the index.
import pandas as pd
df = pd.read_csv("sales.csv", parse_dates=["ts"]).set_index("ts").sort_index()
print(df.head())
Time-based slicing is now natural:
df.loc["2026-01":"2026-03"] # all rows in Q1
df.loc["2026-06-15"] # single day (partial-string indexing)
df.between_time("09:00", "17:00") # business hours across all days
Resample to a different frequency:
daily = df["amount"].resample("D").sum()
weekly = df["amount"].resample("W-MON").sum() # weeks anchored Monday
monthly = df["amount"].resample("M").agg(["sum", "mean", "count"])
resample is groupby for time. You pick the new frequency and an aggregation.
Rolling windows for smoothing and feature engineering:
daily_smoothed = daily.rolling(window=7, min_periods=3).mean()
volatility = daily.rolling(window=30).std()
expanding_total = daily.expanding().sum()
min_periods controls how many points are required before a value is produced. Without it, the first few rolling outputs are NaN, which can confuse downstream consumers.
Lag features for forecasting:
df["amount_lag_1"] = df["amount"].shift(1)
df["amount_lag_7"] = df["amount"].shift(7)
df["amount_yoy"] = df["amount"].shift(365)
Each shifted column lines up a past value with the current row, so a row contains “what happened today” along with “what happened a week ago.”
Trade-offs
A few decisions shape what you can do.
- Naive datetimes vs timezone-aware. Naive (
tz=None) is simple but ambiguous around DST. Aware (tz="UTC"or another zone) is unambiguous but adds bookkeeping. - Resampling labels.
closed="left"vsclosed="right"andlabel="left"vslabel="right"change which timestamp represents the bucket. Pick one convention and stick to it. - Forward fill vs interpolate.
ffillcarries the last known value;interpolatefills smoothly between known points. The right choice depends on whether the metric is a level (price) or a flow (count). - Window functions are not vectorized in all cases. A
rolling(...).apply(custom_func)falls back to a Python loop. Use built-in rolling aggregations when possible.
Timezone bugs are the silent killer. A pipeline that mixes UTC and local timestamps will be wrong by hours, twice a year, in ways that are easy to miss.
Practical Tips
- Always set a
DatetimeIndexfirst. The API is built around it. Working with adatetimecolumn is much clunkier. - Store timestamps in UTC. Convert to local time only for display. This eliminates DST and ambiguity.
- Use
pd.to_datetime(..., utc=True)when parsing. It returns a UTC-aware series from mixed inputs. - Sort the index before resampling or rolling. Operations assume monotonic time.
- Use
asfreqto make a series regular before computing rolling features. Irregular series produce misleading windows. - Pick offset aliases carefully.
Mis month-end,MSis month-start,Wdefaults to Sunday-anchored weeks. Mismatches cause silent shifts. - Watch for partial buckets at the edges. The last incomplete week or month often produces an outlier. Trim or flag it.
- Test DST-sensitive logic explicitly. Generate a few timestamps around a known transition and verify the resampling behavior.
For very large time series, df.set_index("ts").sort_index() once at load time pays back many times in resample and slice speed.
Wrap-up
Time series work in pandas is genuinely pleasant when the index is correct, the timezone is explicit, and you stick to the three core verbs: resample, rolling, shift. Combine those with date-based slicing and you can build dashboards, monitoring rollups, and forecasting features without leaving pandas. Take the time to get the index and timezone right at ingest, and every downstream analysis becomes easier.
Related articles
- Pandas Pandas Window Functions Tutorial
Use Pandas rolling, expanding, and ewm window functions to compute moving averages, running totals, and time-aware aggregations with clear examples.
- Pandas Pandas: apply vs Vectorization
When to reach for .apply and when vectorized operations win. A practical comparison with benchmarks, mental models, and the patterns that keep Pandas code both readable and fast.
- Pandas Pandas Categorical Data Tutorial
Use Pandas Categorical dtype to cut memory, speed up groupby, and encode ordered categories cleanly with practical conversion and pitfall notes.
- Pandas Pandas Data Cleaning Techniques: A Practical Field Guide
Hands-on pandas patterns for cleaning messy real-world data, covering missing values, types, duplicates, strings, and a reliable cleaning pipeline.