AI Agent Cost Governance: Agent FinOps Guide

The $10K wake-up call

Here is how $10,000 disappears in less than a week. Three agents run in parallel, each researching a batch of target companies. Each research call fans out to 5 sub-calls: company scrape, people lookup, tech stack analysis, signal detection, and brief generation. Each sub-call consumes roughly 50,000 tokens at $0.015 per 1,000 tokens.

3 agents x 5 sub-calls = 15 LLM calls per round

15 calls x 50K tokens x $0.015/1K = $11.25 per round

20 rounds of fan-out per batch = $225 per batch run

10 batch runs per day = $2,250 per day

$10,000 in 4.5 days

This is not a contrived scenario. In November 2025, two LangChain agents (an Analyzer and a Verifier) entered an infinite conversation cycle that ran for 11 days, generating a $47,000 bill. The root cause was a misclassified error treated as "retry with different parameters." In a separate incident, a data enrichment agent generated 2.3 million unintended API calls over a weekend. Only an external rate limiter stopped it.

IDC found that 96% of enterprises report AI costs exceeding initial projections. AnalyticsWeek estimates a $400 million collective leak in unbudgeted AI cloud spend across the Fortune 500. A single AI agent caught in a recursive reasoning cycle can, as one analyst put it, "rack up thousands of dollars in compute costs in a single afternoon."

Why traditional monitoring fails for AI agents

Cloud FinOps works because workloads are predictable. A Kubernetes cluster runs N pods, each consuming roughly the same resources. Dashboards show trailing metrics, and trailing metrics work when next month looks like last month.

Agents break this model. They are autonomous and concurrent. One prompt can trigger a chain of tool calls that fans out exponentially. By the time your dashboard updates, the spend has already happened. You are reading the bill, not preventing it.

Provider-level controls help but are too coarse. OpenAI offers monthly project-level budget caps. Anthropic provides per-organization spend limits. Neither operates at the granularity agents need: per-action, per-agent, or per-workflow cost enforcement. And no major agent framework ships a native dollar-denominated budget cap. LangChain, CrewAI, and AutoGen all provide iteration limits and observability hooks, but actual budget enforcement must be built externally.

You would not wait for the AWS bill to discover your Lambda costs tripled. Agent cost governance requires the same shift: from trailing dashboards to leading controls. Evaluate cost before execution, not after.

Three layers of AI agent cost governance

Effective cost governance needs three layers. Each catches failures the others miss.

Layer 1: Per-Action

Cap tokens per individual LLM call. Set max_runtime_sec on every job. Kill calls that exceed thresholds before they finish. This catches the single expensive call.

Layer 2: Per-Agent

Total spend per agent per time window. A research agent gets $50/hour. A drafting agent gets $20/hour. When the budget is exhausted, the agent pauses. This catches agent loops.

Layer 3: Fleet-Level

Total spend across all agents with graceful degradation. When fleet budget hits 80%, throttle non-critical agents. At 95%, pause everything except approved workflows. This catches fan-out explosions.

Policy-as-code for agent cost control

Cost governance belongs in your policy-as-code alongside security and compliance rules. Version-controlled YAML, reviewed in pull requests, enforced before execution. Not a dashboard setting that someone forgets to update.

Cordum's Safety Kernel evaluates cost policy on every job before it runs. Here is what agent cost governance looks like as code:

safety.yaml - agent cost governance

# safety.yaml - agent cost governance
version: v1
rules:
  - id: throttle-llm-calls
    match:
      topics: ["job.*.generate", "job.*.synthesize", "job.*.research"]
      risk_tags: ["high-cost"]
    decision: allow_with_constraints
    constraints:
      max_concurrent: 3
      rate_limit: "20/hour"
      max_runtime_sec: 120
    reason: "LLM calls throttled to prevent runaway spend"

  - id: approve-expensive-batch
    match:
      topics: ["job.*.batch-generate", "job.*.bulk-enrich"]
      risk_tags: ["high-cost", "batch"]
    decision: require_approval
    constraints:
      max_runtime_sec: 600
    reason: "Batch LLM operations above $50 estimated cost need review"

  - id: deny-unbounded-loop
    match:
      topics: ["job.*.recursive-search", "job.*.agent-loop"]
      risk_tags: ["unbounded"]
    decision: deny
    reason: "Unbounded recursive agent loops blocked by policy"

  - id: allow-cached-reads
    match:
      topics: ["job.*.read", "job.*.get", "job.*.list"]
      risk_tags: []
    decision: allow
    reason: "Read operations and cached results pass through"

LLM calls are throttled to 20 per hour with a maximum of 3 concurrent. Batch operations above an estimated cost threshold pause for human review. Unbounded recursive loops are blocked outright. Read operations and cached results pass through with no overhead.

Per-topic timeouts provide a second layer of defense. If a call exceeds its timeout, the Safety Kernel kills it regardless of what the agent thinks it is doing:

Per-topic timeout configuration

# overlays/timeouts.patch.yaml
topics:
  "job.research.company":
    timeout_seconds: 120
    max_retries: 1
  "job.draft.email":
    timeout_seconds: 60
    max_retries: 1
  "job.enrich.contacts":
    timeout_seconds: 30
    max_retries: 0
  "job.generate.report":
    timeout_seconds: 180
    max_retries: 1

Approval gates for expensive agent actions

Some agent actions should not run without a human reviewing the cost implication. A batch enrichment job that will process 10,000 contacts at $0.02 each costs $200. That is worth a 30-second review before execution.

The approval gate pattern extends naturally to cost governance. When an agent submits a job with estimated cost above a threshold, the Safety Kernel returns REQUIRE_APPROVAL. The job pauses. A human sees the estimated cost, the number of sub-calls, and the policy that triggered the gate. They approve, modify, or deny. Total review time: under a minute. Cost of not reviewing: potentially thousands of dollars.

We built this pattern at Cordum after watching a customer's research agent trigger $800 in API calls during a single overnight batch. The agent was working correctly from a logic perspective. It just processed more records than anyone expected. An approval gate on jobs exceeding $50 estimated cost would have caught it in the first batch.

The cost audit trail

Every agent action should record its cost alongside the action itself. Not in a separate billing system that requires a join query. In the same audit trail entry: what the agent did, what policy was evaluated, what the decision was, how many tokens were consumed, and what the cost was.

This enables three things traditional monitoring cannot:

Chargeback per agent. Which agent spent the most this week? Which workflow has the highest cost per run? Which team's agents are most efficient? These answers come from the audit trail, not from parsing provider invoices.

Cost anomaly detection. When an agent's cost per action spikes 3x from its baseline, the system flags it. Not the next billing cycle. Immediately. Because the cost data is in the event stream, not in a monthly invoice.

Budget forecasting. If you know each agent's average cost per action and how many actions it runs per day, you can forecast spend with real data. Not estimates from a spreadsheet, but numbers from production.

Getting started with agent FinOps

Three steps, in order of impact.

Set per-topic timeouts today. Every LLM-calling job should have a max_runtime_sec. Research calls get 120 seconds. Drafting calls get 60. Simple enrichments get 30. If a call cannot complete in its window, something is wrong. Kill it. This alone prevents the worst runaway scenarios.

Add throttle rules for high-cost topics. Rate-limit your most expensive agent actions. 20 research calls per hour, 3 concurrent max. This prevents fan-out explosions while still allowing agents to work. Review and adjust limits weekly based on actual usage from the audit trail.

Require approval for batch operations. Any job that will process more than 100 items or exceed $50 estimated cost should pause for human review. A 30-second approval decision prevents a $500 surprise. Set up a weekly cost review using audit trail data to identify optimization opportunities. Read our quickstart guide to configure these controls in five minutes.

Agent FinOps: Stop AI Agents from Burning $10K

The $10K wake-up call

Why traditional monitoring fails for AI agents

Three layers of AI agent cost governance

Policy-as-code for agent cost control

Approval gates for expensive agent actions

The cost audit trail

Getting started with agent FinOps

Stop the bleed before it starts

Related reading

AI governance in production: policy-first control planes

Policy-as-code for AI agents

Approvals for autonomous AI agents: fast human gates