The 40% stat
In June 2025, Gartner predicted that over 40% of agentic AI projects will be canceled by the end of 2027. The reasons cited: escalating costs, unclear business value, or inadequate risk controls. Not bad models. Not wrong use cases. Infrastructure and governance.
Anushree Verma, Senior Director Analyst at Gartner, put it directly: "Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied."
Gartner's September 2025 survey adds the execution signal. Only 15% of IT application leaders were considering, piloting, or deploying fully autonomous agents, while 75% reported work with some form of AI agents. The same survey found just 13% strongly agreed they had the right governance structures in place. Translation: experimentation is moving faster than controls.
What top governance guides miss
We reviewed three high-ranking governance references for this topic: the Gartner market outlook, TechTarget's strategy guide, and Palo Alto's lifecycle model.
They do a solid job on principles. The gap is operational: very few teams publish target thresholds for policy latency, approval response time, rollback drills, and evidence completeness.
| Source | Covers well | Usually missing | Add in production |
|---|---|---|---|
| Gartner (Jun 2025) | Failure risk, ROI pressure, market signal | No deployment control design | Map each risk to a runtime control and owner |
| TechTarget | Permissions, privacy, lineage, compliance checks | No numeric acceptance criteria | Set SLOs: policy p99, approval SLA, evidence freshness |
| Palo Alto | Lifecycle governance and oversight points | No machine-readable evidence contract | Standardize a trace schema for every high-risk action |
Why AI agent projects fail: three patterns
The failure modes are operational. Models can be excellent and projects can still fail in production. Three patterns show up repeatedly.
Gartner's 2025 survey data shows adoption energy with weak control depth: 75% reporting some agent work, only 15% on fully autonomous deployments, and just 13% strongly agreeing governance structures are ready. This creates a dangerous middle state: enough autonomy to cause damage, not enough control to contain it.
Palo Alto's agentic governance model calls this shift directly: governance moves from output risk to action risk. Teams that only validate model responses, but do not enforce runtime authority boundaries, end up with agents that can execute high-impact actions without policy checkpoints.
TechTarget's governance guidance stresses machine-readable rules, explicit permissions, staged autonomy, and immutable audit trails. Most teams agree with these principles but do not encode them as enforceable runtime policy. The result is governance theater: good docs, weak controls.
Public incidents match these patterns. Tom's Hardware documented destructive database events in both Claude Code and Replit workflows. NVD documented CVE-2025-32711 as AI command injection with unauthorized disclosure risk. Different products, same control failure class.
The four questions test for AI agent governance
Every agent deployment should be able to answer four questions at any point in time. If you cannot answer all four, your project is in the 40%.
Complete record of every action, input, and output. Not logs you grep through, but a structured audit trail you can query.
A versioned, declarative policy that was evaluated before the action ran. Not 'it was within its prompt instructions.'
For high-risk actions, a human reviewed and approved. Not implicit trust based on the agent's training data.
An immutable record tying the action to the policy, the approval, and the outcome. Auditors will ask for this.
This is not a theoretical framework. It is the minimum bar that compliance, legal, and security teams will require before any agent touches production data. If your agent deployment cannot pass this test today, it will not survive an audit tomorrow.
The practical way to enforce the test is to require one structured evidence record per high-risk action. If the record is incomplete, the action should fail closed.
{
"trace_id": "agt_01JQW9S8T9D1V6KQ2C4M8Y7N3P",
"agent": "refund-agent-v4",
"action": "payments.refund.create",
"request_amount_usd": 8700,
"policy_eval": {
"policy_id": "refunds.require-approval.v3",
"decision": "require_approval",
"risk_score": 0.91,
"p99_latency_ms": 4.2
},
"approval": {
"required": true,
"reviewer": "[email protected]",
"response_seconds": 43,
"sla_target_seconds": 120
},
"execution": {
"status": "completed",
"duration_ms": 812
},
"control_checks": {
"rollback_drill_frequency_days": 30,
"last_drill_passed": true
},
"evidence_hash": "sha256:8f5f9f9e8c88f75c3fc57af2b67e3f5c7d580f7f39f94ea0f14a19c9a6f7d8b1"
}What the 60% do differently
The teams that graduate from pilot to production treat governance as part of deployment architecture, not a compliance appendix. They define authority boundaries before rollout, encode controls as executable policy, and measure governance like an SRE discipline.
Three mechanisms show up consistently.
Pre-dispatch policy evaluation. Every agent action is checked against policy before it runs. Read operations pass through instantly. Write operations get flagged. Destructive operations are blocked. Engineers stop worrying about what the agent might do because the policy is explicit and deterministic.
Explicit approval flows. High-risk actions pause and wait for human review. Not every action, just the ones your policy flags. This gives teams confidence to automate more because they know the safety net is real, not theoretical.
Audit trails from day one. Every action, every decision, every approval is recorded with trace ID, timestamp, actor, and rationale. When something goes wrong (and it will), you can reconstruct exactly what happened. When an auditor asks (and they will), you have the proof.
The AI agent governance maturity gap
A Gartner survey of 360 IT application leaders found that only 13% have appropriate governance structures to manage AI agents. Meanwhile, 75% are actively piloting or deploying agents. That is a 62-point gap between deployment and governance. Most organizations are deploying first and governing never.
The same survey found that 74% view AI agents as a new attack vector, yet only 19% express high trust in vendors' safety protections. Teams know the risks exist. They just have not built the infrastructure to manage them.
Gartner separately predicts that guardian agent technologies will capture 10-15% of the agentic AI market by 2030. The market is recognizing the gap. The question is whether your organization fills it before a production incident forces the issue.
What to do about it
Three concrete steps, in order of urgency.
How Cordum approaches this
We built Cordum because we saw these patterns repeating. At enterprise security companies, we learned that access management and security work the same way regardless of what is making the request. Humans, scripts, services, agents: the playbook is policy before execution, decisions on record, humans in the loop for risk.
Cordum's Safety Kernel evaluates every agent action against policy before it runs. Decisions are deterministic: ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Sub-5ms p99 latency. Fail-closed by default. Here is what the policy looks like:
# safety.yaml - agent governance policy
version: v1
rules:
- id: allow-read-ops
match:
topics: ["job.*.read", "job.*.list", "job.*.get"]
risk_tags: []
decision: allow
reason: "Read operations are safe by default"
- id: require-approval-writes
match:
topics: ["job.*.write", "job.*.update", "job.*.create"]
risk_tags: ["data-mutation"]
decision: require_approval
reason: "Write operations need human review"
- id: deny-destructive
match:
topics: ["job.*.delete", "job.*.drop", "job.*.truncate"]
risk_tags: ["destructive"]
decision: deny
reason: "Destructive operations blocked by policy"
- id: throttle-expensive
match:
topics: ["job.*.generate", "job.*.synthesize"]
risk_tags: ["high-cost"]
decision: allow_with_constraints
constraints:
max_concurrent: 3
rate_limit: "10/hour"
reason: "Expensive LLM calls throttled to control cost"Read operations pass through. Write operations pause for review. Destructive operations are blocked. Expensive LLM calls are throttled. The rules are version-controlled YAML, reviewed in pull requests alongside your application code.
Every decision, every approval, every action goes into an append-only audit trail. When someone asks "what happened?" you have the answer. Not because you grepped through logs, but because the system recorded it as a first-class data structure. Read more in our governance architecture post and our production deployment guide.