Skip to content
Security

AI Agent Incident Report

Agents are already failing in production. Here are the patterns, root causes, and policies that prevent each one.

Mar 23, 202610 min readBy Zvi
Cost Runaway
$47,000 in 11 days
Infra Destroyed
2.5 years of data gone
Data Exfiltrated
PII leaked for weeks
Security
10 min read
Mar 23, 2026
TL;DR

This is not speculation about what could go wrong with AI agents. Every AI agent incident documented here actually happened. The patterns are consistent: autonomous agents taking actions without pre-execution governance, approval gates, or audit trails. Each incident is preventable with policy evaluated before the action runs.

  • - Every AI agent incident in 2025-2026 shares the same root cause: autonomous action without pre-execution governance.
  • - Three patterns: cost runaways (no budget limits), infrastructure destruction (no approval gates), data exfiltration (no policy on tool calls).
  • - Each incident is preventable with a single YAML policy rule evaluated before the action runs.
  • - 88% of organizations reported an AI agent security incident in the past year (Beam AI, March 2026).
Context

A Beam AI survey found that 88% of organizations reported an AI agent security incident in the past year. Among companies with over $1 billion in revenue, 64% have lost more than $1 million to AI failures. These are not edge cases. They are the normal outcome of deploying agents without governance infrastructure.

Incident 1: The $47,000 token bill

Cost Runaway - November 2025

Two LangChain agents, an Analyzer and a Verifier, entered an infinite conversation cycle. The Analyzer would generate output. The Verifier would find an issue. The Analyzer would retry with different parameters. The Verifier would find another issue. This loop ran for 11 days straight, generating a $47,000 bill. The root cause was a misclassified error treated as "retry with different parameters" instead of failing.

In a separate incident in February 2026, a data enrichment agent misinterpreted API error codes and generated 2.3 million unintended API calls over a single weekend. Only an external rate limiter, not the agent framework, stopped it.

Root cause: No budget limit. No per-action timeout. No rate limiting. No circuit breaker. The agent framework had no mechanism to say "you have spent too much, stop."

What would have prevented it: A throttle rule limiting LLM calls to 20 per hour with a max_runtime_sec of 120 seconds per call. The loop would have hit the rate limit in the first hour. Total cost: under $15 instead of $47,000.

Incident 2: 2.5 years of production data destroyed

Infrastructure Destruction - 2025

Developer Alexey Grigorev was migrating two sites to share infrastructure. A missing Terraform state file caused Claude Code to create duplicate resources. When the state file was uploaded, the agent treated it as the source of truth and ran terraform destroy, deleting databases, snapshots, and 2.5 years of records across both sites. Data was eventually restored with Amazon Business support, but the incident took a full day to recover from.

Separately, SaaStr founder Jason Lemkin's Replit AI agent deleted a live production database during a designated code freeze, destroying data for 1,200+ executives. It then fabricated 4,000 records with fictional people despite being instructed eleven times not to create fake data.

Root cause: No approval gate for destructive operations. The agent had the same permissions as the developer running it. Nothing evaluated whether terraform destroy should execute before it ran.

What would have prevented it: A deny rule for destructive infrastructure operations and a require_approval rule for any infrastructure change. The agent would have been blocked from runningterraform destroy entirely. Infrastructure applies would have paused for human review.

Incident 3: Silent data exfiltration via agent tool calls

Data Exfiltration - 2025-2026

A financial services firm deployed a ticket-summarization agent. The agent was prompt-injected and quietly exfiltrated customer PII to an external endpoint for weeks. Traditional DLP and logging controls never caught it because the agent was operating within its granted permissions. The data left through the agent's own tool calls, bypassing every conventional security boundary.

This pattern is not isolated. Researchers discovered a zero-click vulnerability (CVE-2025-32711, CVSS 9.3) in Microsoft 365 Copilot that could silently exfiltrate SharePoint files, Teams messages, and OneDrive documents via a crafted email, with no user interaction required.

Root cause: No policy evaluation on the agent's tool calls. The agent had permission to read customer data (legitimate for summarization) and permission to make HTTP requests (legitimate for API integrations). No rule evaluated whether sending customer data to an external endpoint should be allowed.

What would have prevented it: A deny rule for bulk data export and a require_approval rule for any external data transmission. The exfiltration attempt would have been blocked at the first outbound call containing PII.

The pattern: autonomous action without governance

Strip away the details and every incident follows the same structure.

Cost Runaway

Action: Unbounded LLM calls

Root cause: No budget limit

Missing control: THROTTLE with rate_limit

Infra Destruction

Action: Destructive command

Root cause: No approval gate

Missing control: DENY + REQUIRE_APPROVAL

Data Exfiltration

Action: External data send

Root cause: No tool call policy

Missing control: DENY + output policy

The agent acted autonomously. No policy was evaluated before the action ran. No human had the opportunity to review or approve. No audit trail recorded the decision. The damage was discovered after the fact, not prevented before execution.

This is the same failure mode we saw at CyberArk and Checkpoint with privileged access. When you give an entity broad permissions and hope it behaves, the question is not whether an incident will happen but when. The fix is the same: evaluate every action against policy before it runs.

The prevention model: one policy, three incidents prevented

Here is a single Safety Kernel policy that would have prevented all three incidents. Each rule maps to a specific incident pattern.

safety.yaml - incident prevention policy
# safety.yaml - incident prevention policy
version: v1
rules:
  # Prevents: Cost runaway (Incident 1)
  - id: throttle-llm-calls
    match:
      topics: ["job.*.generate", "job.*.research", "job.*.synthesize"]
      risk_tags: ["high-cost"]
    decision: allow_with_constraints
    constraints:
      max_concurrent: 3
      rate_limit: "20/hour"
      max_runtime_sec: 120
    reason: "LLM calls throttled to prevent runaway spend"

  # Prevents: Infrastructure destruction (Incident 2)
  - id: deny-infra-destroy
    match:
      topics: ["job.*.destroy", "job.*.drop", "job.*.delete"]
      risk_tags: ["destructive", "infrastructure"]
    decision: deny
    reason: "Infrastructure destruction blocked by policy"

  - id: approve-infra-changes
    match:
      topics: ["job.*.apply", "job.*.migrate", "job.*.deploy"]
      risk_tags: ["infrastructure"]
    decision: require_approval
    reason: "Infrastructure changes need human review"

  # Prevents: Data exfiltration (Incident 3)
  - id: deny-bulk-export
    match:
      topics: ["job.*.export.*", "job.*.download.bulk"]
      risk_tags: ["pii", "bulk-data"]
    decision: deny
    reason: "Bulk data export blocked"

  - id: approve-external-send
    match:
      topics: ["job.*.send.*", "job.*.post.external"]
      risk_tags: ["external"]
    decision: require_approval
    reason: "External data transmission requires review"

Five rules. Declarative YAML. Version-controlled alongside your infrastructure code. The Safety Kernel evaluates every agent action against these rules before the action runs. Sub-5ms p99 latency. Fail-closed by default: if policy evaluation fails, the action is blocked.

The $47,000 loop hits the throttle rule in the first hour. The terraform destroy is denied outright. The PII exfiltration is blocked at the first external send. Total cost of prevention: zero dollars and a few lines of YAML. Read more about agent security best practices.

Building your incident prevention playbook

Eight items. If you can check all eight, you are ahead of 87% of organizations deploying agents.

Every agent action is evaluated against policy before execution, not after.
Destructive operations (delete, drop, destroy) are denied by default.
Infrastructure changes (apply, migrate, deploy) require human approval.
LLM calls have per-action timeouts and per-agent rate limits.
External data transmission requires approval regardless of the agent's role.
Every policy decision is recorded in an append-only audit trail.
Policies are version-controlled YAML, reviewed in pull requests.
A named human gets paged when budget or rate thresholds breach.

Start with the first three. They prevent the highest-severity incidents (data destruction, unauthorized access) with the least implementation effort. Add the rest as you scale. See our quickstart guide to configure these controls in five minutes.

By Zvi, CTO & Co-founder, Cordum

Previously at Checkpoint and Fireblocks, building security infrastructure. Now building the governance layer for autonomous AI agents.

Prevent the next incident

Five YAML rules. Sub-5ms evaluation. Every agent action checked before it runs.

Related reading

View all