Skip to content
Architecture

What K8s Taught Us About Governing Agents

The industry solved fleet governance once. The control plane pattern applies to AI agents.

Mar 23, 202610 min readBy Zvi
K8s Pattern
Proven fleet governance
Agent Equivalent
Same primitives, new workloads
Control Plane
The missing layer
Architecture
10 min read
Mar 23, 2026
TL;DR

AI agent orchestration follows the same trajectory as container orchestration. Docker worked. Docker at scale without Kubernetes was chaos. Agents work. Agents at scale without a control plane is the same chaos. The primitives K8s established map directly to what agent fleets need.

  • - Admission controllers map to Safety Kernels. RBAC maps to capability restrictions. Resource quotas map to budget limits. The primitives are the same.
  • - K8s got three things right that agent governance needs: declarative state (policy-as-code), fail-closed defaults, and workload/infrastructure separation.
  • - Agents are non-deterministic. Same input can produce different actions. This makes pre-dispatch policy evaluation more important, not less.
  • - 82% of container users run K8s in production (CNCF 2025). The industry already solved fleet governance once. The pattern applies to agents.
Context

The CNCF 2025 Annual Survey reports that 82% of container users run Kubernetes in production. 96% of organizations that evaluated K8s adopted it. And 66% now use K8s for AI inference workloads. The cloud-native community has already begun mapping these patterns to agents: the kube-agentic-networking SIG is defining agent identity, auth, and policy primitives within the K8s ecosystem.

2015: containers without orchestration

Docker made containers easy to build and run. On a single machine, it worked. On ten machines, you needed scripts. On fifty machines, the scripts broke. Networking was ad-hoc. Secrets were environment variables. Resource limits were suggestions. Deployments were SSH scripts or Ansible playbooks that worked until they did not.

Kubernetes solved this by adding a control plane. Not by replacing Docker. By adding the governance, scheduling, and observability layer that made containers manageable at scale. You still ran containers. K8s just made sure they ran safely, within resource bounds, with proper identity, and with an audit trail.

AI agents are at the same inflection point. Individual agents work fine. Running 50 agents in production without governance is the same chaos as running 50 containers without K8s. Different workload, same problem, same solution pattern.

AI agent orchestration: K8s primitives mapped

K8s: Admission Controllers

Validate/mutate resources before persistence

Agent: Safety Kernel

Evaluate every job against policy before dispatch

K8s: RBAC

Role-based access to API resources

Agent: Capability Restrictions

Per-agent capability scoping (read/write/admin)

K8s: Resource Quotas

CPU/memory limits per namespace

Agent: Budget Limits

Token spend and rate limits per agent/fleet

K8s: Namespaces

Workload isolation boundaries

Agent: Multi-Tenancy

Tenant-isolated agent environments

K8s: Audit Logging

Structured record of every API call

Agent: Audit Trail

Structured record of every agent decision

K8s: Liveness Probes

Detect and restart unhealthy pods

Agent: Heartbeats

Detect stale workers, reassign jobs

K8s: Helm Charts

Declarative app packaging and updates

Agent: Pack System

Declarative governance bundle installation

Seven primitives. Seven direct mappings. This is not a forced analogy. These are the same governance problems applied to a different workload type.

What Kubernetes got right for agent governance

Three K8s design decisions apply directly to agent governance.

Declarative desired state. K8s does not tell containers what to do step by step. You declare the desired state (3 replicas, 512MB memory, port 8080) and the control plane reconciles reality to match. Agent governance works the same way. You declare the policy-as-code (reads allowed, writes need approval, destructive blocked) and the Safety Kernel enforces it on every action.

K8s admission policy vs Agent safety policy
# K8s: OPA/Gatekeeper admission policy
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivilegedContainers
metadata:
  name: block-privileged
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]

# Agent equivalent: Safety Kernel policy
# safety.yaml
version: v1
rules:
  - id: block-destructive
    match:
      topics: ["job.*.delete", "job.*.drop"]
      risk_tags: ["destructive"]
    decision: deny
    reason: "Destructive operations blocked by policy"

Fail-closed by default. If a K8s admission controller cannot reach its webhook, it rejects the request. Not because it knows the request is bad, but because it cannot confirm it is safe. Cordum's Safety Kernel follows the same principle. If policy evaluation fails for any reason, the job is blocked. The safe default is deny, not allow.

Workload/infrastructure separation. K8s does not care what runs inside a container. Application developers write code; platform engineers manage the control plane. Agent governance works the same way. Agent developers write agent logic; platform teams manage the governance layer. The two concerns are separate, and mixing them is how both containers and agents get into trouble.

Where the Kubernetes analogy breaks down

Honesty matters more than cleverness here. The analogy has limits.

Containers are deterministic. Agents are not. Run the same container image with the same input and you get the same output. Run the same agent with the same prompt and you might get a completely different sequence of actions. Temperature, context window state, and model updates all introduce variance. This makes pre-dispatch policy evaluation more important for agents than for containers, not less. You cannot predict what an agent will do from its configuration alone.

Containers do not delegate. A container does not autonomously decide to spin up other containers and delegate work to them. Agents do. A research agent can decide to spawn a data access agent that spawns an API caller. This delegation chain does not exist in the container model, and it creates governance challenges (policy inheritance, approval escalation) that K8s never had to solve.

Resource consumption is unpredictable. A container's resource usage is bounded by its limits. An agent's token consumption depends on the model's reasoning path, which varies per request. Budget enforcement for agents requires runtime monitoring and circuit breakers, not just static limits.

These differences do not invalidate the control plane pattern. They reinforce why agents need one even more than containers did.

The control plane pattern applied to agents

Cordum's architecture maps directly to the K8s control plane. This is not an accident. We built it this way because the pattern works. Read more about the workflow orchestration architecture.

K8s control plane to agent control plane mapping
# Kubernetes Control Plane          Agent Control Plane (Cordum)
# -------------------------         ---------------------------
# API Server                    ->  API Gateway
#   Single entrypoint                 Single entrypoint
#   Authn/authz on every request      X-API-Key + tenant isolation
#
# Admission Controllers         ->  Safety Kernel
#   Validate/mutate before persist    Evaluate before dispatch
#   OPA/Gatekeeper policies           safety.yaml rules
#   Fail-closed by default            Fail-closed by default
#
# kube-scheduler                ->  Scheduler
#   Bin-pack pods to nodes            Route jobs to worker pools
#   Resource-aware placement          Capability-based routing
#
# controller-manager            ->  Workflow Engine
#   Reconcile desired state           DAG step orchestration
#   Watch + act loop                  Event-driven progression
#
# etcd                          ->  NATS + Redis
#   Durable state store               Durable messaging + state
#   Watch streams                     JetStream subscriptions
#
# Audit Logging                 ->  Audit Trail
#   Every API call recorded           Every decision recorded
#   Structured JSON events            Structured JSON events

API Gateway handles authentication, routing, and the single entrypoint for all operations, like the K8s API server. Safety Kernel evaluates every job before dispatch, like admission controllers evaluate every resource before persistence. Scheduler routes jobs to worker pools based on capabilities and load, like kube-scheduler places pods on nodes. Workflow Engine orchestrates multi-step processes, like controller-manager reconciles desired state.

If you have operated K8s at scale, you already understand how these components interact. The workloads changed from containers to agents. The governance pattern did not.

For platform engineers

If you think in control plane patterns, you already understand agent governance. Admission controllers are Safety Kernels. RBAC is capability scoping. Resource quotas are budget limits. Audit logs are audit trails. The vocabulary is different. The architecture is the same.

Platform engineering teams are already treating agents as first-class platform citizens, applying the same RBAC, quota, and governance primitives they manage for microservices. The kube-agentic-networking SIG is building agent identity and policy primitives directly into the K8s ecosystem.

We built Cordum for this community. Source-available (BUSL-1.1), built in Go on NATS and Redis, with sub-5ms policy evaluation. If you have opinions about admission controllers and API server design, look at our framework comparison and architecture docs. The primitives will feel familiar.

By Zvi, CTO & Co-founder, Cordum

Previously at Checkpoint and Fireblocks, building security infrastructure. Now building the governance layer for autonomous AI agents.

Apply the control plane pattern

Admission controllers for agents. RBAC for capabilities. Quotas for budgets. Five-minute quickstart.

Related reading

View all