Kubernetes Jobs vs CronJobs: Run Batch Work the Right Way

Beginner 9 min read

What you'll learn

✓What a Job guarantees and how completions work
✓How CronJob scheduling and history limits behave
✓How to handle retries and idempotency
✓Pitfalls around timezones and concurrent runs

Prerequisites

•Basic kubectl and YAML familiarity

What and Why

Not every workload is a long-running web server. You also have database migrations, nightly report generation, queue draining, and one-off data backfills. Kubernetes models these with two controllers. A Job runs one or more pods to completion. A CronJob is a Job factory that creates a Job on a schedule.

The reason to use them instead of a Deployment with restartPolicy: Always is finality. A Job knows when its work is done and stops creating pods. A Deployment will happily restart your migration container forever.

Mental Model

A Job has a target number of completions and a parallelism. The Job controller creates pods until that many succeed. If a pod fails, the controller respects backoffLimit before giving up. A CronJob just schedules Job objects according to a cron expression, with knobs for what to do when runs overlap and how many old Jobs to keep around for debugging.

The history-limit knobs matter more than people expect. Jobs are not garbage-collected automatically by default, and a busy CronJob will fill etcd with completed Jobs in a week.

Hands-on Example

A one-off database migration as a Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate-2026-06-28
spec:
  backoffLimit: 3
  ttlSecondsAfterFinished: 3600
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: migrate
          image: example/app:1.4.2
          command: ["./bin/migrate", "up"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef: { name: db, key: url }

A nightly cleanup as a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: orders-cleanup
spec:
  schedule: "0 3 * * *"
  timeZone: "America/New_York"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  startingDeadlineSeconds: 600
  jobTemplate:
    spec:
      backoffLimit: 2
      ttlSecondsAfterFinished: 86400
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: cleanup
              image: example/cleanup:2.1
              args: ["--older-than=30d"]

schedule: "0 3 * * *"  (timezone America/New_York)

03:00 -> [Job orders-cleanup-28392] -> Pod (success)
04:00 -> previous Job kept in history (limit=3)
03:00 next day -> [Job orders-cleanup-28401] -> Pod
                oldest Job > limit -> deleted by controller

CronJob spawning Jobs over time

Common Pitfalls

Timezones bite first. Before Kubernetes 1.27, CronJobs ran in the kube-controller-manager’s timezone, usually UTC. Set spec.timeZone explicitly so a daylight-saving change does not silently shift your batch window.

restartPolicy: Always is rejected on Jobs. Use OnFailure if you want kubelet to restart the container in place, or Never if you want a fresh pod with a new name on every failure. Never makes debugging easier because each attempt leaves its own pod logs.

concurrencyPolicy: Allow is the default. If your job takes longer than the interval (think hourly cron, 90-minute job), two pods will run at once and probably corrupt shared state. Use Forbid unless you have actively designed for concurrency.

Without ttlSecondsAfterFinished, completed Job objects pile up forever. Set it on every Job; one day is a reasonable default for debuggability.

Production Tips

Make the workload idempotent. CronJobs guarantee at-least-once execution, not exactly-once. A network blip during pod startup can cause the controller to spawn a second pod for the same scheduled time. Your code must tolerate being run twice on the same input.

Emit metrics. A Job that silently fails for a week is worse than one that loudly fails on day one. Use Prometheus’s kube_job_status_failed and an alert that fires when the most recent Job for a CronJob did not succeed.

Pin activeDeadlineSeconds on long-running Jobs so a hung pod cannot block the next scheduled run forever. Combine it with startingDeadlineSeconds so missed schedules during a control-plane outage do not all fire at once when the controller recovers.

Use a unique name suffix for manually triggered Jobs (date, ticket id) so you can audit what ran and when.

Wrap-up

Reach for a Job when you have a finite task and you want Kubernetes to track its completion. Reach for a CronJob when you want that task on a schedule. Set TTLs, pick a concurrency policy on purpose, and make your code idempotent, and batch work on Kubernetes becomes boring in the best way.