Why an AI agent governance maturity model matters
Governance is not binary. You do not go from "no governance" to "fully governed" in a single sprint. Organizations need a roadmap that tells them where they are, what to build next, and what each step gets them.
Right now, most organizations are stuck at the bottom. Gartner reports that while 80% of large enterprises claim AI governance initiatives, fewer than half can demonstrate measurable maturity. Separately, fewer than 1 in 5 organizations have operationalized their AI practices according to Credo AI's analysis. And the regulatory clock is ticking: EU AI Act high-risk enforcement begins August 2, 2026, with fines up to 35 million euros or 7% of global turnover.
A maturity model gives you a shared vocabulary. Instead of arguing about whether your governance is "good enough," you can say "we are at Level 2 and need to reach Level 3 before production."
The five levels of AI agent governance maturity
What you have: Agents run with developer credentials. No policies. No logs beyond what the LLM provider captures. No visibility into what agents are doing.
What is missing: Everything. You cannot answer any of the four questions: what did it do, what policy allowed it, who approved it, where is the proof.
Risk: You will not know about incidents until the damage is visible. Most organizations start here.
What you have: Logging of agent actions after the fact. Alerts on anomalies. Dashboard showing what happened yesterday. Provider-level spend alerts.
What is missing: Pre-execution policy. Agents still act first, and you review later. Approvals are ad-hoc Slack messages, not structured workflows.
Risk: You catch incidents faster but cannot prevent them. Good enough for internal tools, not for production.
What you have: Written policies for what agents can and cannot do. Risk categorization of agent actions. Manual review processes for high-risk operations.
What is missing: Automated enforcement. Policies exist in documents but are not evaluated at runtime. Compliance depends on people remembering to follow the rules.
Risk: Satisfies initial audits but breaks at scale. You can show auditors a policy document but cannot prove it was followed for any specific action.
What you have: Policy-as-code evaluated before every agent action. Approval gates for high-risk operations. Structured audit trail of every decision. Deterministic enforcement.
What is missing: Fleet-wide visibility. You govern individual agents well but cannot manage 50 agents across 10 teams from a single pane.
This is the production bar. You can answer all four questions for any action. Auditors get proof, not promises.
What you have: Centralized governance across all agents, teams, and workflows. Budget enforcement. Cross-agent policy consistency. Organization-wide audit trails. Anomaly detection across the fleet.
What is missing: Nothing structural. At this level, governance improvements are optimizations: better policies, faster approvals, more granular budget controls.
Scale: This is what 100+ agent deployments require. Individual agent governance does not scale; fleet governance does.
Assessment checklist: 10 questions to find your level
Answer yes or no to each question. Your level is the highest consecutive group where you answered yes to all questions.
Can you list every action your agents took yesterday?
Do you have alerts for unusual agent behavior (cost spikes, error rates, access patterns)?
Do you have written policies defining what agents can and cannot do?
Are agent actions categorized by risk level (read/write/destructive)?
Are agent actions evaluated against policy before execution, not after?
Do high-risk actions pause for human approval before proceeding?
Can you produce an audit trail linking any action to the policy that allowed it and the human who approved it?
Can you enforce budget limits across your entire agent fleet from a single control plane?
Do you have organization-wide visibility into all agent actions across every team?
Can you update a policy once and have it apply to every agent in your fleet instantly?
If you answered no to questions 1-2, you are at Level 0. Yes to 1-2 but no to 3-4 puts you at Level 1. Yes through 4 but no to 5-7 means Level 2. Yes through 7 is Level 3. All ten yes is Level 4.
ROI of each governance level
Each level delivers specific, measurable value. Governance is not a cost center. It is an enabler.
Level 1 saves you from blind incidents. You discover problems in hours instead of weeks. A monitoring system that catches a runaway agent burning $2,000/day on the first day instead of the fourteenth saves $26,000.
Level 2 satisfies auditors. You can show written policies and risk categorizations. This is enough for initial SOC 2 conversations and basic regulatory compliance. It is not enough for EU AI Act high-risk requirements, which demand proof of enforcement.
Level 3 gets you to production. Databricks found that organizations with active governance deploy 12x more AI projects to production. Level 3 is where engineering and security teams agree that agents can be trusted with real data, because the trust is enforced, not assumed.
Level 4 scales to 100+ agents. When you run dozens of agents across multiple teams, individual governance per agent becomes a bottleneck. Fleet governance gives you one control plane for all agents: unified policies, centralized audit, budget enforcement across the organization.
How to move up: concrete steps between levels
Level 0 to Level 1: Add logging and alerting. Instrument your agent framework to record every action with timestamp, input, output, and duration. Set up cost alerts with your LLM provider. This takes a day, not a sprint.
Level 1 to Level 2: Write your agent policies. Define what actions are read-only (safe), which are write (need review), and which are destructive (blocked). Categorize every agent topic by risk level. Put this in a document your security team signs off on.
Level 2 to Level 3: This is the big jump. Turn your documents into code. Instead of a policy that says "agents should not delete production data," you need a rule that blocks it at runtime. Here is what that transition looks like:
# Level 2: Written policies (not enforced) # This is a document, not code. It cannot prevent anything. # # Agent Policy v1 (Google Doc) # - Agents should not access production databases # - Agents should not send emails without review # - High-risk actions require manager approval # (Nobody checks. Nobody enforces. Nobody audits.)
# Level 3: Pre-dispatch governance (enforced)
# This is code. It prevents violations before they happen.
version: v1
rules:
- id: allow-read-ops
match:
topics: ["job.*.read", "job.*.list"]
risk_tags: []
decision: allow
reason: "Read operations safe by default"
- id: require-approval-writes
match:
topics: ["job.*.write", "job.*.update"]
risk_tags: ["data-mutation"]
decision: require_approval
reason: "Write operations need human review"
- id: deny-production-delete
match:
topics: ["job.*.delete", "job.*.drop"]
risk_tags: ["destructive"]
decision: deny
reason: "Destructive production ops blocked"The Level 3 policy is evaluated by a Safety Kernel before every action. Read operations pass through. Write operations pause for human review. Destructive operations are blocked. The rules are version-controlled, testable, and auditable. Read more about this transition in our policy-as-code guide.
Level 3 to Level 4: Add fleet-wide visibility and cross-team governance. Centralize policy management so a single update applies across all agents. Add budget enforcement and anomaly detection at the organization level. This is infrastructure work, not policy work.
Where Cordum fits
Cordum is Level 3-4 infrastructure. The Safety Kernel provides pre-dispatch policy evaluation (Level 3). Approval workflows provide structured human-in-the-loop gates (Level 3). Fleet governance features, budget enforcement, and organization-wide audit trails provide Level 4 capabilities.
We built Cordum because we kept seeing teams stuck at Level 2. They had policies, they had good intentions, but they had no enforcement layer. The jump from "we wrote down the rules" to "the system enforces the rules" requires infrastructure that most teams do not have time to build. Read more about our governance architecture and the five-minute quickstart.
This maturity model is useful regardless of whether you use Cordum. The assessment, the levels, and the steps between them apply to any governance implementation. What matters is knowing where you stand and having a plan to get to Level 3 before your first production incident forces the conversation.