Skip to content
C Codeloom
Pandas

Pandas Time Series Analysis Tutorial

Work with datetime indexes, resampling, rolling windows, lag features, and timezone gotchas to analyze time series cleanly in pandas.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • Why a DatetimeIndex unlocks the best pandas API surface
  • Resampling between frequencies with aggregations
  • Rolling and expanding windows for smoothing and stats
  • Building lag and lead features for forecasting
  • Timezone and DST pitfalls that bite in production

Prerequisites

  • Familiar with how APIs work

What and Why

Time series data is everywhere: web traffic, sales, sensor readings, server metrics. Pandas has rich support for it, but only if your dataframe has a proper DatetimeIndex. Once that is in place, resampling, rolling windows, and date-based slicing become one-liners.

Get the index right and pandas does almost all the work. Get it wrong and you fight off-by-one and timezone bugs forever.

Mental Model

Three operations cover most time-series work:

  • Resampling: change the frequency (daily to weekly, ticks to minutes).
  • Rolling windows: compute a function over the last N points at each step.
  • Shifting: align past or future values into the current row to build features.
raw points (irregular timestamps)
 |
 v resample('D').sum()      -> regular daily series
 |
 v rolling(window=7).mean() -> smoothed 7-day average
 |
 v shift(1)                 -> previous day's value as a feature
Three core time-series transforms

These three composed together form the backbone of dashboards, monitoring, and most forecasting feature pipelines.

Hands-on Example

Start by parsing dates as the index.

import pandas as pd

df = pd.read_csv("sales.csv", parse_dates=["ts"]).set_index("ts").sort_index()
print(df.head())

Time-based slicing is now natural:

df.loc["2026-01":"2026-03"]        # all rows in Q1
df.loc["2026-06-15"]               # single day (partial-string indexing)
df.between_time("09:00", "17:00")  # business hours across all days

Resample to a different frequency:

daily = df["amount"].resample("D").sum()
weekly = df["amount"].resample("W-MON").sum()  # weeks anchored Monday
monthly = df["amount"].resample("M").agg(["sum", "mean", "count"])

resample is groupby for time. You pick the new frequency and an aggregation.

Rolling windows for smoothing and feature engineering:

daily_smoothed = daily.rolling(window=7, min_periods=3).mean()
volatility = daily.rolling(window=30).std()
expanding_total = daily.expanding().sum()

min_periods controls how many points are required before a value is produced. Without it, the first few rolling outputs are NaN, which can confuse downstream consumers.

Lag features for forecasting:

df["amount_lag_1"] = df["amount"].shift(1)
df["amount_lag_7"] = df["amount"].shift(7)
df["amount_yoy"] = df["amount"].shift(365)

Each shifted column lines up a past value with the current row, so a row contains “what happened today” along with “what happened a week ago.”

Trade-offs

A few decisions shape what you can do.

  • Naive datetimes vs timezone-aware. Naive (tz=None) is simple but ambiguous around DST. Aware (tz="UTC" or another zone) is unambiguous but adds bookkeeping.
  • Resampling labels. closed="left" vs closed="right" and label="left" vs label="right" change which timestamp represents the bucket. Pick one convention and stick to it.
  • Forward fill vs interpolate. ffill carries the last known value; interpolate fills smoothly between known points. The right choice depends on whether the metric is a level (price) or a flow (count).
  • Window functions are not vectorized in all cases. A rolling(...).apply(custom_func) falls back to a Python loop. Use built-in rolling aggregations when possible.

Timezone bugs are the silent killer. A pipeline that mixes UTC and local timestamps will be wrong by hours, twice a year, in ways that are easy to miss.

Practical Tips

  • Always set a DatetimeIndex first. The API is built around it. Working with a datetime column is much clunkier.
  • Store timestamps in UTC. Convert to local time only for display. This eliminates DST and ambiguity.
  • Use pd.to_datetime(..., utc=True) when parsing. It returns a UTC-aware series from mixed inputs.
  • Sort the index before resampling or rolling. Operations assume monotonic time.
  • Use asfreq to make a series regular before computing rolling features. Irregular series produce misleading windows.
  • Pick offset aliases carefully. M is month-end, MS is month-start, W defaults to Sunday-anchored weeks. Mismatches cause silent shifts.
  • Watch for partial buckets at the edges. The last incomplete week or month often produces an outlier. Trim or flag it.
  • Test DST-sensitive logic explicitly. Generate a few timestamps around a known transition and verify the resampling behavior.

For very large time series, df.set_index("ts").sort_index() once at load time pays back many times in resample and slice speed.

Wrap-up

Time series work in pandas is genuinely pleasant when the index is correct, the timezone is explicit, and you stick to the three core verbs: resample, rolling, shift. Combine those with date-based slicing and you can build dashboards, monitoring rollups, and forecasting features without leaving pandas. Take the time to get the index and timezone right at ingest, and every downstream analysis becomes easier.