Name: Cordum
Author: Cordum

The production problem

Autonomous agents can multiply request volume faster than humans can observe dashboards. One feedback loop bug can flood APIs in seconds.

If your only control is “retry later,” overload becomes a self-amplifying loop across workers, queues, and dependencies.

What top results miss

Source	Strong coverage	Missing piece
AWS API Gateway throttling	Clear token-bucket throttling model and burst vs steady-state semantics.	No autonomous-agent context where throttling decisions must map to policy outcomes.
Envoy local rate limit filter	Concrete local token-bucket behavior, 429 handling, and per-route descriptors.	Does not address multi-agent control planes with governance-aware requeue paths.
Apigee quota policy	Quota policy patterns for API traffic governance at product boundaries.	Limited guidance for per-action risk-tier throttling in autonomous workflows.

Overload control model

Layer	Required design	Failure if missing
Global cap	Protect shared infrastructure with a platform-wide request ceiling.	Hot topics starve the entire control plane.
Topic cap	Assign stricter limits to risky side-effecting topics.	Low-value high-rate traffic crowds out critical operations.
Actor cap	Apply per-agent or per-tenant quotas for fairness.	One runaway agent consumes the full fleet budget.
Escalation path	Define when repeated throttles trigger approval or manual intervention.	Systems oscillate between retry and throttle with no resolution.

Cordum throttle behavior

Control	Current behavior	Why it matters
Submit-time throttle	Policy throttle returns HTTP 429 / gRPC ResourceExhausted	Stops overload before job persistence and dispatch fan-out.
Dispatch-time throttle	Scheduler evaluates allow/deny/approve/throttle before worker routing	Catches runtime overload conditions that appear after submission.
Throttle delay	Scheduler uses `safetyThrottleDelay` of 5s on throttle conditions	Creates bounded requeue pressure rather than immediate hammering.
Fail-mode separation	Gateway and scheduler have separate fail-mode controls	Lets teams choose availability/safety tradeoffs per control point.

Implementation examples

Token bucket primitive (Go)

bucket.go

type Bucket struct {
  Tokens        int
  MaxTokens     int
  TokensPerFill int
  FillInterval  time.Duration
}

func Allow(b *Bucket) bool {
  refill(b)
  if b.Tokens <= 0 {
    return false
  }
  b.Tokens--
  return true
}

Topic throttle policy (YAML)

rate-limits.yaml

YAML

rate_limits:
  global:
    max_rps: 200
    burst: 400
  topics:
    infra.delete:
      max_rps: 2
      burst: 4
    ticket.read:
      max_rps: 50
      burst: 100
throttle_action:
  on_limit: requeue
  delay: 5s

Throttle decision event (JSON)

throttle-event.json

JSON

{
  "ts": "2026-03-31T18:04:11Z",
  "topic": "infra.delete",
  "decision": "throttle",
  "http_status": 429,
  "retry_after_ms": 5000,
  "actor": "ops-agent",
  "tenant": "prod"
}

Limitations and tradeoffs

- Strict limits protect systems and can delay legitimate urgent actions.
- Loose burst settings improve latency and can hide runaway behavior until too late.
- Global caps are simple and can penalize critical topics during low-value spikes.
- Per-actor quotas improve fairness and increase policy complexity.

Next step

Run this in one sprint:

1. Define topic risk tiers and assign base/burst limits per tier.
2. Add per-actor quotas for top three high-volume agent identities.
3. Alert on throttle ratio and retry-after volume, not only error count.
4. Run one overload drill and verify throttle path prevents queue explosion.

Continue with AI Agent Timeouts, Retries, and Backoff and AI Agent Circuit Breaker Pattern.

AI Agent Rate Limiting and Overload Control