Definition
An AI agent control plane is the infrastructure layer that evaluates policy, manages approvals, enforces constraints, and records audit evidence for autonomous AI agent actions. It operates outside the agent's reasoning loop.
The control plane does not decide how an agent thinks, which tools it selects, or how it reasons about a problem. Those are framework responsibilities. The control plane decides whether the agent's chosen action is allowed to execute, who needs to approve it, what constraints apply, and what evidence to record.
The term borrows directly from network and infrastructure engineering. In a Kubernetes cluster, the control plane (API server, scheduler, controller manager) makes decisions about where to place workloads. The data plane (kubelets, pods) executes them. The same separation applies to AI agent systems: the framework is the data plane, the control plane governs it.
Why agents need a control plane
What frameworks do well
Agent frameworks like LangChain, CrewAI, AutoGen, and LlamaIndex are excellent at agent behavior: reasoning loops, tool selection, memory management, multi-agent coordination, and LLM integration. This is the data plane. It handles how agents accomplish tasks.
What frameworks do not handle
None of the major frameworks include pre-dispatch policy enforcement. None require mandatory approval workflows with audit binding. None provide fleet-level governance across different frameworks. None define fail-mode behavior (what happens when the governance layer is down). None implement output quarantine for unsafe content. These are control plane functions.
The gap in practice
When Agent A delegates to Agent B, which delegates to Agent C, and Agent C exports customer PII to an external API: who approved it? What policy applied? Where is the audit trail? What was the risk assessment? Which human was responsible?
Frameworks cannot answer these questions because they operate at the individual agent level. A control plane operates at the fleet level. It sees every action from every agent, evaluates each against a shared policy, and records every decision with evidence.
Architecture
A control plane has four core components. Each can be implemented independently, but production systems need all four.
Agent Framework (LangChain, CrewAI, AutoGen, custom)
|
v
[Submit API] --> [Policy Engine / Safety Kernel]
|
+-- ALLOW --> [Dispatcher] --> [Worker Pool] --> Execution
|
+-- DENY --> Action blocked, agent notified
|
+-- REQUIRE_APPROVAL --> [Approval Queue]
|
Human reviews
|
+-- Approve --> Dispatcher executes
|
+-- Reject --> Action blocked
|
[Audit Trail] <-- Every decision recorded with evidencePolicy engine (Safety Kernel)
Evaluates every action against versioned policy rules before dispatch. Rules are defined in YAML or code, not in natural language prompts. The engine returns a deterministic decision: allow, deny, require approval, or throttle. The decision is based on action properties (topic, capability, risk tags, metadata), not on the agent's reasoning.
Approval workflows
When a policy returns REQUIRE_APPROVAL, the action enters an approval queue. The queue routes to the right approver based on risk tier, team ownership, or action type. The approval is cryptographically bound to the policy snapshot and action hash at the time of submission. If the action or policy changes after submission, the approval is invalidated.
Audit trail
Every decision is logged with the policy version that produced it, the timestamp, the actor identity, the action details, and the decision evidence. The audit trail is immutable and queryable. For compliance, this is the artifact that proves governance was enforced.
Operational controls
Constraints (maximum runtime, network allowlists, tool capability scoping), fail-mode policy (fail-open vs fail-closed when the control plane is unavailable), and output safety (quarantine or redact agent output that violates content policies).
The Kubernetes analogy
The control plane concept comes from infrastructure engineering. In Kubernetes, the control plane consists of the API server (receives requests), the scheduler (decides where to place workloads), and the controller manager (ensures desired state). The data plane consists of kubelets and pods that run the actual containers.
This separation is why Kubernetes can manage thousands of containers safely. No container decides where to run or how much CPU to claim. The control plane makes those decisions based on policy.
The same principle applies to AI agents. No agent should decide whether its own action is safe to execute. The control plane makes that decision based on policy. The agent framework handles execution. The control plane handles governance.
What current resources cover vs miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| LangChain: Agent Architecture Docs | Excellent coverage of agent coordination, tool routing, and memory management. | No separation between agent behavior and fleet governance. No pre-dispatch policy enforcement. |
| Microsoft: Agentic AI Patterns | Strong architectural patterns for enterprise agent systems with Azure integration. | Mixes orchestration and governance concepts. No standalone control plane definition or reference architecture. |
| CNCF: Platform Engineering on Kubernetes | Established control plane / data plane separation for infrastructure workloads. | Infrastructure-focused. No application to AI agent governance, policy-as-code for LLM actions, or approval workflows. |
Control plane vs framework vs orchestrator
These three layers are complementary, not competing. Most production agent systems need at least a framework and a control plane. Some also need an orchestrator for durable workflow execution.
| Layer | Purpose | Examples | Decides |
|---|---|---|---|
| Framework | Agent behavior: reasoning, tool selection, memory, multi-agent coordination | LangChain, CrewAI, AutoGen, LlamaIndex | How agents think and act |
| Orchestrator | Workflow execution: task ordering, retries, durability, scheduling | Temporal, Prefect, Airflow | When and in what order tasks run |
| Control Plane | Governance: policy enforcement, approvals, constraints, audit evidence | Cordum | What agents are allowed to do |
Core capabilities checklist
What to look for when evaluating an AI agent control plane:
Getting started
Start with your highest-risk agent action. The one that would cause the most damage if it went wrong. Add a single policy rule that requires approval before that action executes.
version: v1
rules:
- id: allow-read-operations
match:
risk_tags:
- read
decision: allow
- id: require-approval-for-writes
match:
risk_tags:
- write
decision: require_approval
reason: "Write operations require human approval"
- id: deny-destructive
match:
risk_tags:
- destructive
decision: deny
reason: "Destructive operations are blocked"Validate the policy in staging by simulating actions against it. Check that ALLOW, DENY, and REQUIRE_APPROVAL decisions match your expectations. Then deploy to production and expand to more actions.
Cordum quickstart walks through the full setup in under 10 minutes: install, define a policy, submit your first governed action, and verify the decision in the dashboard.
Frequently asked questions
An AI agent control plane is the infrastructure layer that evaluates policy, manages approvals, enforces constraints, and records audit evidence for autonomous AI agent actions. It operates outside the agent's reasoning loop and governs what agents are allowed to do, not how they think.
An orchestrator (like Temporal or Prefect) manages task ordering, retries, and workflow durability. A control plane manages permissions: which actions are allowed, which require approval, which are blocked. An orchestrator decides when to run. A control plane decides whether to run.
If your agents perform any action with real-world consequences (sending emails, writing to databases, calling external APIs), yes. LangChain and CrewAI handle agent behavior but do not include pre-dispatch policy enforcement, mandatory approval workflows, or immutable audit trails. These are control plane functions.
A safety kernel is one component of a control plane. It evaluates policy rules and returns a decision (allow, deny, require approval). The control plane also includes the approval queue, the dispatcher, the audit system, and the constraint enforcement layer.
Yes. A properly designed control plane is framework-agnostic. It evaluates actions by their properties (topic, capability, risk tags) not by which framework submitted them. LangChain, CrewAI, AutoGen, and custom agents can all submit actions to the same control plane.
A control plane is the technical implementation of AI governance for agent systems. AI governance is the broader discipline that includes policy design, risk assessment, compliance strategy, and organizational processes. The control plane is the enforcement layer that makes governance policies operational.
This is fail-mode policy. Fail-closed means all agent actions are blocked until the control plane recovers (safer but impacts availability). Fail-open means actions proceed without policy checks (maintains availability but loses governance). The choice depends on your risk tolerance and should be explicitly configured, not left to chance.
Start with pre-dispatch policy evaluation on your highest-risk agent action. Add a single rule that requires approval before that action executes. Validate in staging, then expand to more actions and more sophisticated rules. The goal is to prove the pattern works before scaling it.
Next step
If you are evaluating control plane options, start with the evaluation guide for a structured checklist. If you want to understand the broader governance landscape, read What Is AI Agent Governance?.