The $10K wake-up call
Here is how $10,000 disappears in less than a week. Three agents run in parallel, each researching a batch of target companies. Each research call fans out to 5 sub-calls: company scrape, people lookup, tech stack analysis, signal detection, and brief generation. Each sub-call consumes roughly 50,000 tokens at $0.015 per 1,000 tokens.
This is not a contrived scenario. In November 2025, two LangChain agents (an Analyzer and a Verifier) entered an infinite conversation cycle that ran for 11 days, generating a $47,000 bill. The root cause was a misclassified error treated as "retry with different parameters." In a separate incident, a data enrichment agent generated 2.3 million unintended API calls over a weekend. Only an external rate limiter stopped it.
IDC found that 96% of enterprises report AI costs exceeding initial projections. AnalyticsWeek estimates a $400 million collective leak in unbudgeted AI cloud spend across the Fortune 500. A single AI agent caught in a recursive reasoning cycle can, as one analyst put it, "rack up thousands of dollars in compute costs in a single afternoon."
Why traditional monitoring fails for AI agents
Cloud FinOps works because workloads are predictable. A Kubernetes cluster runs N pods, each consuming roughly the same resources. Dashboards show trailing metrics, and trailing metrics work when next month looks like last month.
Agents break this model. They are autonomous and concurrent. One prompt can trigger a chain of tool calls that fans out exponentially. By the time your dashboard updates, the spend has already happened. You are reading the bill, not preventing it.
Provider-level controls help but are too coarse. OpenAI offers monthly project-level budget caps. Anthropic provides per-organization spend limits. Neither operates at the granularity agents need: per-action, per-agent, or per-workflow cost enforcement. And no major agent framework ships a native dollar-denominated budget cap. LangChain, CrewAI, and AutoGen all provide iteration limits and observability hooks, but actual budget enforcement must be built externally.
You would not wait for the AWS bill to discover your Lambda costs tripled. Agent cost governance requires the same shift: from trailing dashboards to leading controls. Evaluate cost before execution, not after.
Three layers of AI agent cost governance
Effective cost governance needs three layers. Each catches failures the others miss.
Cap tokens per individual LLM call. Set max_runtime_sec on every job. Kill calls that exceed thresholds before they finish. This catches the single expensive call.
Total spend per agent per time window. A research agent gets $50/hour. A drafting agent gets $20/hour. When the budget is exhausted, the agent pauses. This catches agent loops.
Total spend across all agents with graceful degradation. When fleet budget hits 80%, throttle non-critical agents. At 95%, pause everything except approved workflows. This catches fan-out explosions.
Policy-as-code for agent cost control
Cost governance belongs in your policy-as-code alongside security and compliance rules. Version-controlled YAML, reviewed in pull requests, enforced before execution. Not a dashboard setting that someone forgets to update.
Cordum's Safety Kernel evaluates cost policy on every job before it runs. Here is what agent cost governance looks like as code:
# safety.yaml - agent cost governance
version: v1
rules:
- id: throttle-llm-calls
match:
topics: ["job.*.generate", "job.*.synthesize", "job.*.research"]
risk_tags: ["high-cost"]
decision: allow_with_constraints
constraints:
max_concurrent: 3
rate_limit: "20/hour"
max_runtime_sec: 120
reason: "LLM calls throttled to prevent runaway spend"
- id: approve-expensive-batch
match:
topics: ["job.*.batch-generate", "job.*.bulk-enrich"]
risk_tags: ["high-cost", "batch"]
decision: require_approval
constraints:
max_runtime_sec: 600
reason: "Batch LLM operations above $50 estimated cost need review"
- id: deny-unbounded-loop
match:
topics: ["job.*.recursive-search", "job.*.agent-loop"]
risk_tags: ["unbounded"]
decision: deny
reason: "Unbounded recursive agent loops blocked by policy"
- id: allow-cached-reads
match:
topics: ["job.*.read", "job.*.get", "job.*.list"]
risk_tags: []
decision: allow
reason: "Read operations and cached results pass through"LLM calls are throttled to 20 per hour with a maximum of 3 concurrent. Batch operations above an estimated cost threshold pause for human review. Unbounded recursive loops are blocked outright. Read operations and cached results pass through with no overhead.
Per-topic timeouts provide a second layer of defense. If a call exceeds its timeout, the Safety Kernel kills it regardless of what the agent thinks it is doing:
# overlays/timeouts.patch.yaml
topics:
"job.research.company":
timeout_seconds: 120
max_retries: 1
"job.draft.email":
timeout_seconds: 60
max_retries: 1
"job.enrich.contacts":
timeout_seconds: 30
max_retries: 0
"job.generate.report":
timeout_seconds: 180
max_retries: 1Approval gates for expensive agent actions
Some agent actions should not run without a human reviewing the cost implication. A batch enrichment job that will process 10,000 contacts at $0.02 each costs $200. That is worth a 30-second review before execution.
The approval gate pattern extends naturally to cost governance. When an agent submits a job with estimated cost above a threshold, the Safety Kernel returns REQUIRE_APPROVAL. The job pauses. A human sees the estimated cost, the number of sub-calls, and the policy that triggered the gate. They approve, modify, or deny. Total review time: under a minute. Cost of not reviewing: potentially thousands of dollars.
We built this pattern at Cordum after watching a customer's research agent trigger $800 in API calls during a single overnight batch. The agent was working correctly from a logic perspective. It just processed more records than anyone expected. An approval gate on jobs exceeding $50 estimated cost would have caught it in the first batch.
The cost audit trail
Every agent action should record its cost alongside the action itself. Not in a separate billing system that requires a join query. In the same audit trail entry: what the agent did, what policy was evaluated, what the decision was, how many tokens were consumed, and what the cost was.
This enables three things traditional monitoring cannot:
Chargeback per agent. Which agent spent the most this week? Which workflow has the highest cost per run? Which team's agents are most efficient? These answers come from the audit trail, not from parsing provider invoices.
Cost anomaly detection. When an agent's cost per action spikes 3x from its baseline, the system flags it. Not the next billing cycle. Immediately. Because the cost data is in the event stream, not in a monthly invoice.
Budget forecasting. If you know each agent's average cost per action and how many actions it runs per day, you can forecast spend with real data. Not estimates from a spreadsheet, but numbers from production.
Getting started with agent FinOps
Three steps, in order of impact.