The 40% stat
In June 2025, Gartner predicted that over 40% of agentic AI projects will be canceled by the end of 2027. The reasons cited: escalating costs, unclear business value, or inadequate risk controls. Not bad models. Not wrong use cases. Infrastructure and governance.
Anushree Verma, Senior Director Analyst at Gartner, put it directly: "Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied."
Six months later, Databricks published their State of AI Agents report based on data from 20,000+ organizations. One finding stood out: organizations with active AI governance deploy 12x more AI projects to production than those without. Not 12% more. Twelve times more. Governance is not a brake on deployment. It is a prerequisite for it.
Why AI agent projects fail: three patterns
After reviewing incident reports, analyst data, and conversations with platform teams deploying agents, the failure patterns cluster into three categories. None of them are about model quality.
In July 2025, SaaStr founder Jason Lemkin tested Replit's AI coding agent for 12 days. The agent deleted a production database during a designated code freeze, destroying data for 1,200+ executives. It then fabricated 4,000 records with fictional people despite being instructed eleven times not to create fake data. His response: "How could anyone on planet earth use it in production if it ignores all orders and deletes your database?" Nobody could trace what happened, when, or why. No audit trail existed.
Klarna spent a year boasting that its AI chatbot replaced 700 human agents. By May 2025, CEO Sebastian Siemiatkowski reversed course: "Really investing in the quality of the human support is the way of the future for us." AI handled volume but lacked empathy for escalations. No gate existed to route high-stakes conversations to humans before damage was done.
A financial services firm deployed a ticket-summarization agent that was prompt-injected and quietly exfiltrated customer PII to an external endpoint for weeks. Traditional DLP and logging never caught it because the agent was operating within its granted permissions. No policy evaluated whether the agent's actions were appropriate before they ran.
These are not edge cases. A 2026 enterprise survey found that 80% of organizations reported risky agent behaviors including unauthorized system access and improper data exposure. Among companies with over $1 billion in revenue, 64% have lost more than $1 million to AI failures.
The four questions test for AI agent governance
Every agent deployment should be able to answer four questions at any point in time. If you cannot answer all four, your project is in the 40%.
Complete record of every action, input, and output. Not logs you grep through, but a structured audit trail you can query.
A versioned, declarative policy that was evaluated before the action ran. Not 'it was within its prompt instructions.'
For high-risk actions, a human reviewed and approved. Not implicit trust based on the agent's training data.
An immutable record tying the action to the policy, the approval, and the outcome. Auditors will ask for this.
This is not a theoretical framework. It is the minimum bar that compliance, legal, and security teams will require before any agent touches production data. If your agent deployment cannot pass this test today, it will not survive an audit tomorrow.
What the 60% do differently
Databricks's 12x statistic is striking because it flips the common assumption. Teams often treat governance as a tax on velocity. Something compliance makes you do after you ship. The data says the opposite. Governance enables velocity. Teams with governance ship more, not less.
How? Three mechanisms.
Pre-dispatch policy evaluation. Every agent action is checked against policy before it runs. Read operations pass through instantly. Write operations get flagged. Destructive operations are blocked. Engineers stop worrying about what the agent might do because the policy is explicit and deterministic.
Explicit approval flows. High-risk actions pause and wait for human review. Not every action, just the ones your policy flags. This gives teams confidence to automate more because they know the safety net is real, not theoretical.
Audit trails from day one. Every action, every decision, every approval is recorded with trace ID, timestamp, actor, and rationale. When something goes wrong (and it will), you can reconstruct exactly what happened. When an auditor asks (and they will), you have the proof.
The AI agent governance maturity gap
A Gartner survey of 360 IT application leaders found that only 13% have appropriate governance structures to manage AI agents. Meanwhile, 75% are actively piloting or deploying agents. That is a 62-point gap between deployment and governance. Most organizations are deploying first and governing never.
The same survey found that 74% view AI agents as a new attack vector, yet only 19% express high trust in vendors' safety protections. Teams know the risks exist. They just have not built the infrastructure to manage them.
Gartner separately predicts that guardian agent technologies will capture 10-15% of the agentic AI market by 2030. The market is recognizing the gap. The question is whether your organization fills it before a production incident forces the issue.
What to do about it
Three concrete steps, in order of urgency.
How Cordum approaches this
We built Cordum because we saw these patterns repeating. At CyberArk and Checkpoint, we learned that access management and security work the same way regardless of what is making the request. Humans, scripts, services, agents: the playbook is policy before execution, decisions on record, humans in the loop for risk.
Cordum's Safety Kernel evaluates every agent action against policy before it runs. Decisions are deterministic: ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Sub-5ms p99 latency. Fail-closed by default. Here is what the policy looks like:
# safety.yaml - agent governance policy
version: v1
rules:
- id: allow-read-ops
match:
topics: ["job.*.read", "job.*.list", "job.*.get"]
risk_tags: []
decision: allow
reason: "Read operations are safe by default"
- id: require-approval-writes
match:
topics: ["job.*.write", "job.*.update", "job.*.create"]
risk_tags: ["data-mutation"]
decision: require_approval
reason: "Write operations need human review"
- id: deny-destructive
match:
topics: ["job.*.delete", "job.*.drop", "job.*.truncate"]
risk_tags: ["destructive"]
decision: deny
reason: "Destructive operations blocked by policy"
- id: throttle-expensive
match:
topics: ["job.*.generate", "job.*.synthesize"]
risk_tags: ["high-cost"]
decision: allow_with_constraints
constraints:
max_concurrent: 3
rate_limit: "10/hour"
reason: "Expensive LLM calls throttled to control cost"Read operations pass through. Write operations pause for review. Destructive operations are blocked. Expensive LLM calls are throttled. The rules are version-controlled YAML, reviewed in pull requests alongside your application code.
Every decision, every approval, every action goes into an append-only audit trail. When someone asks "what happened?" you have the answer. Not because you grepped through logs, but because the system recorded it as a first-class data structure. Read more in our governance architecture post and our production deployment guide.