2015: containers without orchestration
Docker made containers easy to build and run. On a single machine, it worked. On ten machines, you needed scripts. On fifty machines, the scripts broke. Networking was ad-hoc. Secrets were environment variables. Resource limits were suggestions. Deployments were SSH scripts or Ansible playbooks that worked until they did not.
Kubernetes solved this by adding a control plane. Not by replacing Docker. By adding the governance, scheduling, and observability layer that made containers manageable at scale. You still ran containers. K8s just made sure they ran safely, within resource bounds, with proper identity, and with an audit trail.
AI agents are at the same inflection point. Individual agents work fine. Running 50 agents in production without governance is the same chaos as running 50 containers without K8s. Different workload, same problem, same solution pattern.
AI agent orchestration: K8s primitives mapped
Validate/mutate resources before persistence
Evaluate every job against policy before dispatch
Role-based access to API resources
Per-agent capability scoping (read/write/admin)
CPU/memory limits per namespace
Token spend and rate limits per agent/fleet
Workload isolation boundaries
Tenant-isolated agent environments
Structured record of every API call
Structured record of every agent decision
Detect and restart unhealthy pods
Detect stale workers, reassign jobs
Declarative app packaging and updates
Declarative governance bundle installation
Seven primitives. Seven direct mappings. This is not a forced analogy. These are the same governance problems applied to a different workload type.
What Kubernetes got right for agent governance
Three K8s design decisions apply directly to agent governance.
Declarative desired state. K8s does not tell containers what to do step by step. You declare the desired state (3 replicas, 512MB memory, port 8080) and the control plane reconciles reality to match. Agent governance works the same way. You declare the policy-as-code (reads allowed, writes need approval, destructive blocked) and the Safety Kernel enforces it on every action.
# K8s: OPA/Gatekeeper admission policy
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivilegedContainers
metadata:
name: block-privileged
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
# Agent equivalent: Safety Kernel policy
# safety.yaml
version: v1
rules:
- id: block-destructive
match:
topics: ["job.*.delete", "job.*.drop"]
risk_tags: ["destructive"]
decision: deny
reason: "Destructive operations blocked by policy"Fail-closed by default. If a K8s admission controller cannot reach its webhook, it rejects the request. Not because it knows the request is bad, but because it cannot confirm it is safe. Cordum's Safety Kernel follows the same principle. If policy evaluation fails for any reason, the job is blocked. The safe default is deny, not allow.
Workload/infrastructure separation. K8s does not care what runs inside a container. Application developers write code; platform engineers manage the control plane. Agent governance works the same way. Agent developers write agent logic; platform teams manage the governance layer. The two concerns are separate, and mixing them is how both containers and agents get into trouble.
Where the Kubernetes analogy breaks down
Honesty matters more than cleverness here. The analogy has limits.
Containers are deterministic. Agents are not. Run the same container image with the same input and you get the same output. Run the same agent with the same prompt and you might get a completely different sequence of actions. Temperature, context window state, and model updates all introduce variance. This makes pre-dispatch policy evaluation more important for agents than for containers, not less. You cannot predict what an agent will do from its configuration alone.
Containers do not delegate. A container does not autonomously decide to spin up other containers and delegate work to them. Agents do. A research agent can decide to spawn a data access agent that spawns an API caller. This delegation chain does not exist in the container model, and it creates governance challenges (policy inheritance, approval escalation) that K8s never had to solve.
Resource consumption is unpredictable. A container's resource usage is bounded by its limits. An agent's token consumption depends on the model's reasoning path, which varies per request. Budget enforcement for agents requires runtime monitoring and circuit breakers, not just static limits.
These differences do not invalidate the control plane pattern. They reinforce why agents need one even more than containers did.
The control plane pattern applied to agents
Cordum's architecture maps directly to the K8s control plane. This is not an accident. We built it this way because the pattern works. Read more about the workflow orchestration architecture.
# Kubernetes Control Plane Agent Control Plane (Cordum) # ------------------------- --------------------------- # API Server -> API Gateway # Single entrypoint Single entrypoint # Authn/authz on every request X-API-Key + tenant isolation # # Admission Controllers -> Safety Kernel # Validate/mutate before persist Evaluate before dispatch # OPA/Gatekeeper policies safety.yaml rules # Fail-closed by default Fail-closed by default # # kube-scheduler -> Scheduler # Bin-pack pods to nodes Route jobs to worker pools # Resource-aware placement Capability-based routing # # controller-manager -> Workflow Engine # Reconcile desired state DAG step orchestration # Watch + act loop Event-driven progression # # etcd -> NATS + Redis # Durable state store Durable messaging + state # Watch streams JetStream subscriptions # # Audit Logging -> Audit Trail # Every API call recorded Every decision recorded # Structured JSON events Structured JSON events
API Gateway handles authentication, routing, and the single entrypoint for all operations, like the K8s API server. Safety Kernel evaluates every job before dispatch, like admission controllers evaluate every resource before persistence. Scheduler routes jobs to worker pools based on capabilities and load, like kube-scheduler places pods on nodes. Workflow Engine orchestrates multi-step processes, like controller-manager reconciles desired state.
If you have operated K8s at scale, you already understand how these components interact. The workloads changed from containers to agents. The governance pattern did not.
For platform engineers
If you think in control plane patterns, you already understand agent governance. Admission controllers are Safety Kernels. RBAC is capability scoping. Resource quotas are budget limits. Audit logs are audit trails. The vocabulary is different. The architecture is the same.
Platform engineering teams are already treating agents as first-class platform citizens, applying the same RBAC, quota, and governance primitives they manage for microservices. The kube-agentic-networking SIG is building agent identity and policy primitives directly into the K8s ecosystem.
We built Cordum for this community. Source-available (BUSL-1.1), built in Go on NATS and Redis, with sub-5ms policy evaluation. If you have opinions about admission controllers and API server design, look at our framework comparison and architecture docs. The primitives will feel familiar.