Kubernetes Operators and CRDs: A Practical Introduction
How CustomResourceDefinitions extend the Kubernetes API and how operators encode operational knowledge as controllers reconciling desired state.
What you'll learn
- ✓What a CRD is and is not
- ✓The control loop pattern
- ✓Operators vs Helm charts
- ✓Common operator failure modes
- ✓When to write your own
Prerequisites
- •Familiar with terminals and YAML
What and Why
A CustomResourceDefinition (CRD) lets you add new object kinds to the Kubernetes API. Once registered, your Database or Cluster object behaves like a built-in: kubectl get, kubectl describe, RBAC, and watch all work for free. An operator is a controller that watches those objects and drives the world toward what they describe.
This pattern is how Kubernetes scales beyond Pod, Deployment, and Service. Cert-manager, ArgoCD, Prometheus, and database operators all use it. Knowing how the pattern works makes you both a better consumer and an occasional author.
Mental Model
Every Kubernetes controller runs the same loop:
desired state (spec)
|
v
+-> [ Controller observes ] --+
| | |
| v |
| compare to actual state |
| | |
| v |
| create/update/delete |
| | |
+--------+ |
| |
v |
actual state (status) |
\----------------+ The built-in Deployment controller does this for ReplicaSets. An operator does the same for your CRD. The contract: write the spec as the user, the operator writes the status.
Hands-on Example
Define a tiny CRD for a Greeting resource:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata: { name: greetings.example.com }
spec:
group: example.com
scope: Namespaced
names: { plural: greetings, singular: greeting, kind: Greeting, shortNames: [gr] }
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required: [name]
properties:
name: { type: string }
language: { type: string, enum: [en, es, fr], default: en }
status:
type: object
properties:
message: { type: string }
subresources: { status: {} }
Now users can write:
apiVersion: example.com/v1
kind: Greeting
metadata: { name: hi-alice }
spec: { name: Alice, language: fr }
The operator (written with kubebuilder or Operator SDK) watches Greeting and writes the rendered message into .status.message. The reconcile loop, in pseudo-Go:
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
var g examplev1.Greeting
if err := r.Get(ctx, req.NamespacedName, &g); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
msg := render(g.Spec.Language, g.Spec.Name)
if g.Status.Message != msg {
g.Status.Message = msg
if err := r.Status().Update(ctx, &g); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
Real operators do more: create Deployments, manage Secrets, run jobs, call external APIs. The shape stays the same.
Common Pitfalls
- Treating reconcile as a one-shot script. Reconciles run many times. Make them idempotent - safe to call when nothing changed.
- Putting derived data in spec. Spec is user intent; status is observed reality. Mixing them confuses both humans and the operator.
- Tight requeue loops. Returning
Requeue: trueon every call wastes CPU and creates API server load. Use event watches instead. - No
finalizersfor cleanup. Without a finalizer, deletion of a CR may orphan child resources or external state. Add one when external side effects matter. - Skipping schema validation. A bad CRD without
openAPIV3Schemaaccepts garbage. Define types and required fields.
Production Tips
- Prefer adopting a community operator over writing your own. Cert-manager, Strimzi (Kafka), CloudNativePG, External Secrets, ArgoCD - all production grade.
- Use kubebuilder for new operators. It scaffolds the manager, RBAC, and CRDs and stays current with controller-runtime.
- Bound blast radius with namespaced operators where possible. Cluster-scoped operators are powerful but a bug can break everything.
- Emit Events and structured logs from reconcile.
kubectl describeshowing why a CR is stuck is the difference between a 5-minute and a 5-hour debug. - Version CRDs with conversion webhooks when you change the schema. Never silently break old YAML.
Wrap-up
CRDs plus controllers turn Kubernetes into a platform you can extend. Define a resource that captures user intent, write a reconcile loop that drives reality toward it, and let the API server, RBAC, and watch infrastructure do the heavy lifting. Most of the time, the right move is adopting an existing operator. Occasionally, you will write one - and now the pattern is no longer mysterious.
Related articles
- Kubernetes Kubernetes Cluster Upgrades and Pod Eviction Explained
How Kubernetes cluster upgrades drain nodes, how pod eviction works, and how PodDisruptionBudgets and graceful shutdown keep workloads safe during upgrades.
- Kubernetes Kubernetes ConfigMaps and Secrets Tutorial
A practical walkthrough of ConfigMaps and Secrets in Kubernetes, including how to inject them as environment variables, mount as files, and rotate safely.
- Kubernetes Introduction to Kubernetes Helm Charts
Learn what Helm charts are, how templates and values work together, and how to package your own application for repeatable, parameterized Kubernetes deployments.
- Kubernetes Kubernetes Horizontal Pod Autoscaler Explained
Understand how HPA decides when to add or remove pods, the metrics it can scale on, and the tuning knobs that prevent flapping and runaway scaling.