Skip to content
C Codeloom
Kubernetes

Kubernetes Operators and CRDs: A Practical Introduction

How CustomResourceDefinitions extend the Kubernetes API and how operators encode operational knowledge as controllers reconciling desired state.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • What a CRD is and is not
  • The control loop pattern
  • Operators vs Helm charts
  • Common operator failure modes
  • When to write your own

Prerequisites

  • Familiar with terminals and YAML

What and Why

A CustomResourceDefinition (CRD) lets you add new object kinds to the Kubernetes API. Once registered, your Database or Cluster object behaves like a built-in: kubectl get, kubectl describe, RBAC, and watch all work for free. An operator is a controller that watches those objects and drives the world toward what they describe.

This pattern is how Kubernetes scales beyond Pod, Deployment, and Service. Cert-manager, ArgoCD, Prometheus, and database operators all use it. Knowing how the pattern works makes you both a better consumer and an occasional author.

Mental Model

Every Kubernetes controller runs the same loop:

         desired state (spec)
            |
            v
+-> [ Controller observes ] --+
|        |                    |
|        v                    |
|   compare to actual state   |
|        |                    |
|        v                    |
|    create/update/delete     |
|        |                    |
+--------+                    |
            |                |
            v                |
      actual state (status)  |
            \----------------+
The reconcile loop

The built-in Deployment controller does this for ReplicaSets. An operator does the same for your CRD. The contract: write the spec as the user, the operator writes the status.

Hands-on Example

Define a tiny CRD for a Greeting resource:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata: { name: greetings.example.com }
spec:
  group: example.com
  scope: Namespaced
  names: { plural: greetings, singular: greeting, kind: Greeting, shortNames: [gr] }
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: [name]
              properties:
                name: { type: string }
                language: { type: string, enum: [en, es, fr], default: en }
            status:
              type: object
              properties:
                message: { type: string }
      subresources: { status: {} }

Now users can write:

apiVersion: example.com/v1
kind: Greeting
metadata: { name: hi-alice }
spec: { name: Alice, language: fr }

The operator (written with kubebuilder or Operator SDK) watches Greeting and writes the rendered message into .status.message. The reconcile loop, in pseudo-Go:

func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    var g examplev1.Greeting
    if err := r.Get(ctx, req.NamespacedName, &g); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    msg := render(g.Spec.Language, g.Spec.Name)
    if g.Status.Message != msg {
        g.Status.Message = msg
        if err := r.Status().Update(ctx, &g); err != nil {
            return ctrl.Result{}, err
        }
    }
    return ctrl.Result{}, nil
}

Real operators do more: create Deployments, manage Secrets, run jobs, call external APIs. The shape stays the same.

Common Pitfalls

  • Treating reconcile as a one-shot script. Reconciles run many times. Make them idempotent - safe to call when nothing changed.
  • Putting derived data in spec. Spec is user intent; status is observed reality. Mixing them confuses both humans and the operator.
  • Tight requeue loops. Returning Requeue: true on every call wastes CPU and creates API server load. Use event watches instead.
  • No finalizers for cleanup. Without a finalizer, deletion of a CR may orphan child resources or external state. Add one when external side effects matter.
  • Skipping schema validation. A bad CRD without openAPIV3Schema accepts garbage. Define types and required fields.

Production Tips

  • Prefer adopting a community operator over writing your own. Cert-manager, Strimzi (Kafka), CloudNativePG, External Secrets, ArgoCD - all production grade.
  • Use kubebuilder for new operators. It scaffolds the manager, RBAC, and CRDs and stays current with controller-runtime.
  • Bound blast radius with namespaced operators where possible. Cluster-scoped operators are powerful but a bug can break everything.
  • Emit Events and structured logs from reconcile. kubectl describe showing why a CR is stuck is the difference between a 5-minute and a 5-hour debug.
  • Version CRDs with conversion webhooks when you change the schema. Never silently break old YAML.

Wrap-up

CRDs plus controllers turn Kubernetes into a platform you can extend. Define a resource that captures user intent, write a reconcile loop that drives reality toward it, and let the API server, RBAC, and watch infrastructure do the heavy lifting. Most of the time, the right move is adopting an existing operator. Occasionally, you will write one - and now the pattern is no longer mysterious.