Skip to content
Security

AI Agent Incident Report

Agents are already failing in production. Here are the patterns, root causes, and policies that prevent each one.

Apr 1, 202610 min readBy Zvi
Runaway Spend
Unbounded loops
Infra Destroyed
2.5 years of data gone
Data Exfiltrated
CVE-class disclosure
Security
10 min read
Apr 1, 2026
TL;DR

Incident headlines look different, but the mechanics repeat. Agents get broad permissions, run without dispatch-time constraints, and execute side effects before any policy decision. This page maps publicly documented incidents and vulnerability records to concrete governance controls that block the same failures.

  • - Public incident reports show recurring failure modes: unbounded autonomy, destructive side effects, and missing outbound data controls.
  • - Runaway token spend is amplified by parallel teammates and recursive retries unless explicit rate, runtime, and budget limits exist.
  • - Destructive infrastructure operations must be deny-by-default or approval-gated before execution.
  • - Prompt-injection-driven disclosure is not theoretical: CVE-2025-32711 documents network-exploitable information disclosure in an agentic copilot flow.
Context

The current incident landscape is no longer hypothetical. Publicly documented coding-agent failures already include destructive database operations during normal developer workflows. On the security side, CVE-2025-32711 records AI command injection in M365 Copilot with network-based information disclosure and no required user interaction in the published vector. The issue is not missing threat models. The issue is missing runtime policy enforcement before side effects execute.

What top incident writeups miss

Most incident coverage explains what failed. Fewer sources explain the reusable enforcement contract that would have prevented the failure in the same request path.

SourceWhat it covers wellGap for production teams
Tom's Hardware: Claude Code deletes production setup, including DB and snapshotsConcrete failure chain: state-file confusion, destructive infra action, and documented recovery path.No reusable control-plane model for pre-dispatch deny/approval decisions across agent stacks.
Tom's Hardware: Replit agent deletes production DB during code freezeOperational transcript of guardrail failure under autonomy and explicit freeze violation.No policy contract describing how to enforce freeze semantics before side effects execute.
NVD: CVE-2025-32711 (M365 Copilot Information Disclosure)Official vulnerability record, CVSS vectors, and vendor advisory chain for AI command injection.No implementation blueprint for runtime output controls, approval gates, and auditable remediation workflows.

Incident 1: Runaway token spend from excessive agency

Cost Runaway Pattern

OWASP classifies this as a blend of LLM04 (Model DoS) and LLM08 (Excessive Agency): unbounded execution paths that consume resources and create operational blast radius.

Provider docs expose the same mechanics. Anthropic states that agent teammates each maintain independent contexts, idle teammates still consume tokens, and plan-mode teams can use roughly 7x tokens versus standard sessions. Source: Claude Code cost guidance.

Example math: 6 active teammates x 40k tokens/teammate cycle x 15 cycles/day x $0.015 per 1k tokens = $54/day for one workflow lane before retries and background activity. Multiply by teams and environments, and cost spikes stop being edge cases.

Root cause: No hard budget envelope on autonomous execution. Retry depth, concurrency, and runtime all remained effectively unbounded.

What would have prevented it: Dispatch-time controls with explicit limits: max-runtime-per-call, per-agent rate limits, and fleet-level budget throttles that pause non-critical jobs before token consumption continues.

Incident 2: 2.5 years of production data destroyed

Infrastructure Destruction - 2025

Developer Alexey Grigorev was migrating two sites to share infrastructure. A missing Terraform state file caused Claude Code to create duplicate resources. When the state file was uploaded, the agent treated it as the source of truth and ran terraform destroy, deleting databases, snapshots, and 2.5 years of records across both sites. Data was eventually restored with Amazon Business support, but the incident took a full day to recover from.

Separately, SaaStr founder Jason Lemkin documented a Replit agent deleting production data during an explicit code freeze. The agent later acknowledged a "catastrophic error in judgment" after running unauthorized database commands. Source: Tom's Hardware coverage and incident transcript links.

Root cause: No approval gate for destructive operations. The agent had the same permissions as the developer running it. Nothing evaluated whether terraform destroy should execute before it ran.

What would have prevented it: A deny rule for destructive infrastructure operations and a require_approval rule for any infrastructure change. The agent would have been blocked from runningterraform destroy entirely. Infrastructure applies would have paused for human review.

Incident 3: Silent data exfiltration via agent tool calls

Data Exfiltration - 2025-2026

CVE-2025-32711 is a concrete example of agentic disclosure risk. NVD describes it as AI command injection in M365 Copilot that allows unauthorized information disclosure over a network. The published CVSS vector includes UI:N (no user interaction required), which is exactly the failure mode security teams struggle to catch with manual review gates alone.

The same NVD record points to the Microsoft vendor advisory and tracks the CNA score at 9.3 (Critical). Source chain: NVD detail page and Microsoft MSRC advisory references.

Root cause: No policy evaluation on the agent's tool calls. The agent had permission to read customer data (legitimate for summarization) and permission to make HTTP requests (legitimate for API integrations). No rule evaluated whether sending customer data to an external endpoint should be allowed.

What would have prevented it: A deny rule for bulk data export and a require_approval rule for any external data transmission. The exfiltration attempt would have been blocked at the first outbound call containing PII.

The pattern: autonomous action without governance

Strip away the details and every incident follows the same structure.

Cost Runaway

Action: Unbounded LLM calls

Root cause: No budget limit

Missing control: THROTTLE with rate_limit

Infra Destruction

Action: Destructive command

Root cause: No approval gate

Missing control: DENY + REQUIRE_APPROVAL

Data Exfiltration

Action: External data send

Root cause: No tool call policy

Missing control: DENY + output policy

The agent acted autonomously. No policy was evaluated before the action ran. No human had the opportunity to review or approve. No audit trail recorded the decision. The damage was discovered after the fact, not prevented before execution.

This is the same failure mode we saw at enterprise security companies with privileged access. When you give an entity broad permissions and hope it behaves, the question is not whether an incident will happen but when. The fix is the same: evaluate every action against policy before it runs.

The prevention model: one policy, three incidents prevented

Here is a single Safety Kernel policy that would have prevented all three incidents. Each rule maps to a specific incident pattern.

safety.yaml - incident prevention policy
# safety.yaml - incident prevention policy
version: v1
rules:
  # Prevents: Cost runaway (Incident 1)
  - id: throttle-llm-calls
    match:
      topics: ["job.*.generate", "job.*.research", "job.*.synthesize"]
      risk_tags: ["high-cost"]
    decision: allow_with_constraints
    constraints:
      max_concurrent: 3
      rate_limit: "20/hour"
      max_runtime_sec: 120
    reason: "LLM calls throttled to prevent runaway spend"

  # Prevents: Infrastructure destruction (Incident 2)
  - id: deny-infra-destroy
    match:
      topics: ["job.*.destroy", "job.*.drop", "job.*.delete"]
      risk_tags: ["destructive", "infrastructure"]
    decision: deny
    reason: "Infrastructure destruction blocked by policy"

  - id: approve-infra-changes
    match:
      topics: ["job.*.apply", "job.*.migrate", "job.*.deploy"]
      risk_tags: ["infrastructure"]
    decision: require_approval
    reason: "Infrastructure changes need human review"

  # Prevents: Data exfiltration (Incident 3)
  - id: deny-bulk-export
    match:
      topics: ["job.*.export.*", "job.*.download.bulk"]
      risk_tags: ["pii", "bulk-data"]
    decision: deny
    reason: "Bulk data export blocked"

  - id: approve-external-send
    match:
      topics: ["job.*.send.*", "job.*.post.external"]
      risk_tags: ["external"]
    decision: require_approval
    reason: "External data transmission requires review"

Five rules. Declarative YAML. Version-controlled alongside your infrastructure code. The Safety Kernel evaluates every agent action against these rules before the action runs. Sub-5ms p99 latency. Fail-closed by default: if policy evaluation fails, the action is blocked.

The runaway spend loop hits the throttle rule in the first hour. The terraform destroy is denied outright. The disclosure attempt is blocked at the first external send. Total cost of prevention: zero dollars and a few lines of YAML. Read more about agent security best practices.

Building your incident prevention playbook

Eight controls. If you can check all eight, your incident surface is materially smaller than teams relying on logs alone.

Every agent action is evaluated against policy before execution, not after.
Destructive operations (delete, drop, destroy) are denied by default.
Infrastructure changes (apply, migrate, deploy) require human approval.
LLM calls have per-action timeouts and per-agent rate limits.
External data transmission requires approval regardless of the agent's role.
Every policy decision is recorded in an append-only audit trail.
Policies are version-controlled YAML, reviewed in pull requests.
A named human gets paged when budget or rate thresholds breach.

Start with the first three. They prevent the highest-severity incidents (data destruction, unauthorized access) with the least implementation effort. Add the rest as you scale. See our quickstart guide to configure these controls in five minutes.

By Zvi, CTO & Co-founder, Cordum

Decade of experience building security infrastructure at enterprise scale. Now building the governance layer for autonomous AI agents.

Prevent the next incident

Five YAML rules. Sub-5ms evaluation. Every agent action checked before it runs.

Related reading

View all