The production problem
Autonomous agents can multiply request volume faster than humans can observe dashboards. One feedback loop bug can flood APIs in seconds.
If your only control is “retry later,” overload becomes a self-amplifying loop across workers, queues, and dependencies.
What top results miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| AWS API Gateway HTTP throttling | Token bucket semantics, account-level vs route-level limits, and 429 behavior. | Treats limits as API throughput controls, not governance decisions for autonomous agent actions. |
| Envoy local rate limit filter | Per-route token-bucket controls, descriptor overrides, and configurable 429 signaling. | Focuses on proxy-level enforcement, not policy-aware scheduler outcomes across agent fleets. |
| Apigee quota policy | Dynamic quotas, identifier-based counters, and weighted counting for token-cost style traffic. | No direct guidance for pre-dispatch throttle decisions tied to autonomous workflow risk tiers. |
Overload control model
| Layer | Required design | Failure if missing |
|---|---|---|
| Global cap | Protect shared infrastructure with a platform-wide request ceiling. | Hot topics starve the entire control plane. |
| Topic cap | Assign stricter limits to risky side-effecting topics. | Low-value high-rate traffic crowds out critical operations. |
| Actor cap | Apply per-agent or per-tenant quotas for fairness. | One runaway agent consumes the full fleet budget. |
| Escalation path | Define when repeated throttles trigger approval or manual intervention. | Systems oscillate between retry and throttle with no resolution. |
Cordum throttle behavior
| Control | Current behavior | Why it matters |
|---|---|---|
| Submit-time throttle | Policy throttle returns HTTP 429 / gRPC ResourceExhausted | Stops overload before job persistence and dispatch fan-out. |
| Dispatch-time throttle | Scheduler evaluates allow/deny/approve/throttle before worker routing | Catches runtime overload conditions that appear after submission. |
| Throttle delay | Scheduler uses `safetyThrottleDelay` of 5s on throttle conditions | Creates bounded requeue pressure rather than immediate hammering. |
| Fail-mode separation | Gateway and scheduler have separate fail-mode controls | Lets teams choose availability/safety tradeoffs per control point. |
Implementation examples
Token bucket primitive (Go)
type Bucket struct {
Tokens int
MaxTokens int
TokensPerFill int
FillInterval time.Duration
}
func Allow(b *Bucket) bool {
refill(b)
if b.Tokens <= 0 {
return false
}
b.Tokens--
return true
}Topic throttle policy (YAML)
rate_limits:
global:
max_rps: 200
burst: 400
topics:
infra.delete:
max_rps: 2
burst: 4
ticket.read:
max_rps: 50
burst: 100
throttle_action:
on_limit: requeue
delay: 5sThrottle decision event (JSON)
{
"ts": "2026-04-01T18:04:11Z",
"topic": "infra.delete",
"decision": "throttle",
"http_status": 429,
"retry_after_ms": 5000,
"actor": "ops-agent",
"tenant": "prod"
}Limitations and tradeoffs
- - Strict limits protect systems and can delay legitimate urgent actions.
- - Loose burst settings improve latency and can hide runaway behavior until too late.
- - Global caps are simple and can penalize critical topics during low-value spikes.
- - Per-actor quotas improve fairness and increase policy complexity.
Next step
Run this in one sprint:
- 1. Define topic risk tiers and assign base/burst limits per tier.
- 2. Add per-actor quotas for top three high-volume agent identities.
- 3. Alert on throttle ratio and retry-after volume, not only error count.
- 4. Run one overload drill and verify throttle path prevents queue explosion.
Continue with AI Agent Timeouts, Retries, and Backoff and AI Agent Circuit Breaker Pattern.