Kubernetes Jobs vs CronJobs: Run Batch Work the Right Way
Compare Kubernetes Jobs and CronJobs with real YAML, retry semantics, and the gotchas that turn a simple nightly task into a 3 AM page.
What you'll learn
- ✓What a Job guarantees and how completions work
- ✓How CronJob scheduling and history limits behave
- ✓How to handle retries and idempotency
- ✓Pitfalls around timezones and concurrent runs
Prerequisites
- •Basic kubectl and YAML familiarity
What and Why
Not every workload is a long-running web server. You also have database migrations, nightly report generation, queue draining, and one-off data backfills. Kubernetes models these with two controllers. A Job runs one or more pods to completion. A CronJob is a Job factory that creates a Job on a schedule.
The reason to use them instead of a Deployment with restartPolicy: Always is finality. A Job knows when its work is done and stops creating pods. A Deployment will happily restart your migration container forever.
Mental Model
A Job has a target number of completions and a parallelism. The Job controller creates pods until that many succeed. If a pod fails, the controller respects backoffLimit before giving up. A CronJob just schedules Job objects according to a cron expression, with knobs for what to do when runs overlap and how many old Jobs to keep around for debugging.
The history-limit knobs matter more than people expect. Jobs are not garbage-collected automatically by default, and a busy CronJob will fill etcd with completed Jobs in a week.
Hands-on Example
A one-off database migration as a Job:
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate-2026-06-28
spec:
backoffLimit: 3
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: OnFailure
containers:
- name: migrate
image: example/app:1.4.2
command: ["./bin/migrate", "up"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef: { name: db, key: url }
A nightly cleanup as a CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: orders-cleanup
spec:
schedule: "0 3 * * *"
timeZone: "America/New_York"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5
startingDeadlineSeconds: 600
jobTemplate:
spec:
backoffLimit: 2
ttlSecondsAfterFinished: 86400
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: example/cleanup:2.1
args: ["--older-than=30d"]
schedule: "0 3 * * *" (timezone America/New_York)
03:00 -> [Job orders-cleanup-28392] -> Pod (success)
04:00 -> previous Job kept in history (limit=3)
03:00 next day -> [Job orders-cleanup-28401] -> Pod
oldest Job > limit -> deleted by controller Common Pitfalls
Timezones bite first. Before Kubernetes 1.27, CronJobs ran in the kube-controller-manager’s timezone, usually UTC. Set spec.timeZone explicitly so a daylight-saving change does not silently shift your batch window.
restartPolicy: Always is rejected on Jobs. Use OnFailure if you want kubelet to restart the container in place, or Never if you want a fresh pod with a new name on every failure. Never makes debugging easier because each attempt leaves its own pod logs.
concurrencyPolicy: Allow is the default. If your job takes longer than the interval (think hourly cron, 90-minute job), two pods will run at once and probably corrupt shared state. Use Forbid unless you have actively designed for concurrency.
Without ttlSecondsAfterFinished, completed Job objects pile up forever. Set it on every Job; one day is a reasonable default for debuggability.
Production Tips
Make the workload idempotent. CronJobs guarantee at-least-once execution, not exactly-once. A network blip during pod startup can cause the controller to spawn a second pod for the same scheduled time. Your code must tolerate being run twice on the same input.
Emit metrics. A Job that silently fails for a week is worse than one that loudly fails on day one. Use Prometheus’s kube_job_status_failed and an alert that fires when the most recent Job for a CronJob did not succeed.
Pin activeDeadlineSeconds on long-running Jobs so a hung pod cannot block the next scheduled run forever. Combine it with startingDeadlineSeconds so missed schedules during a control-plane outage do not all fire at once when the controller recovers.
Use a unique name suffix for manually triggered Jobs (date, ticket id) so you can audit what ran and when.
Wrap-up
Reach for a Job when you have a finite task and you want Kubernetes to track its completion. Reach for a CronJob when you want that task on a schedule. Set TTLs, pick a concurrency policy on purpose, and make your code idempotent, and batch work on Kubernetes becomes boring in the best way.
Related articles
- Kubernetes Kubernetes Cluster Upgrades and Pod Eviction Explained
How Kubernetes cluster upgrades drain nodes, how pod eviction works, and how PodDisruptionBudgets and graceful shutdown keep workloads safe during upgrades.
- Kubernetes Kubernetes ConfigMaps and Secrets Tutorial
A practical walkthrough of ConfigMaps and Secrets in Kubernetes, including how to inject them as environment variables, mount as files, and rotate safely.
- Kubernetes Introduction to Kubernetes Helm Charts
Learn what Helm charts are, how templates and values work together, and how to package your own application for repeatable, parameterized Kubernetes deployments.
- Kubernetes Kubernetes Horizontal Pod Autoscaler Explained
Understand how HPA decides when to add or remove pods, the metrics it can scale on, and the tuning knobs that prevent flapping and runaway scaling.