The production problem
Autonomous agents can multiply request volume faster than humans can observe dashboards. One feedback loop bug can flood APIs in seconds.
If your only control is “retry later,” overload becomes a self-amplifying loop across workers, queues, and dependencies.
What top results miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| AWS API Gateway throttling | Clear token-bucket throttling model and burst vs steady-state semantics. | No autonomous-agent context where throttling decisions must map to policy outcomes. |
| Envoy local rate limit filter | Concrete local token-bucket behavior, 429 handling, and per-route descriptors. | Does not address multi-agent control planes with governance-aware requeue paths. |
| Apigee quota policy | Quota policy patterns for API traffic governance at product boundaries. | Limited guidance for per-action risk-tier throttling in autonomous workflows. |
Overload control model
| Layer | Required design | Failure if missing |
|---|---|---|
| Global cap | Protect shared infrastructure with a platform-wide request ceiling. | Hot topics starve the entire control plane. |
| Topic cap | Assign stricter limits to risky side-effecting topics. | Low-value high-rate traffic crowds out critical operations. |
| Actor cap | Apply per-agent or per-tenant quotas for fairness. | One runaway agent consumes the full fleet budget. |
| Escalation path | Define when repeated throttles trigger approval or manual intervention. | Systems oscillate between retry and throttle with no resolution. |
Cordum throttle behavior
| Control | Current behavior | Why it matters |
|---|---|---|
| Submit-time throttle | Policy throttle returns HTTP 429 / gRPC ResourceExhausted | Stops overload before job persistence and dispatch fan-out. |
| Dispatch-time throttle | Scheduler evaluates allow/deny/approve/throttle before worker routing | Catches runtime overload conditions that appear after submission. |
| Throttle delay | Scheduler uses `safetyThrottleDelay` of 5s on throttle conditions | Creates bounded requeue pressure rather than immediate hammering. |
| Fail-mode separation | Gateway and scheduler have separate fail-mode controls | Lets teams choose availability/safety tradeoffs per control point. |
Implementation examples
Token bucket primitive (Go)
type Bucket struct {
Tokens int
MaxTokens int
TokensPerFill int
FillInterval time.Duration
}
func Allow(b *Bucket) bool {
refill(b)
if b.Tokens <= 0 {
return false
}
b.Tokens--
return true
}Topic throttle policy (YAML)
rate_limits:
global:
max_rps: 200
burst: 400
topics:
infra.delete:
max_rps: 2
burst: 4
ticket.read:
max_rps: 50
burst: 100
throttle_action:
on_limit: requeue
delay: 5sThrottle decision event (JSON)
{
"ts": "2026-03-31T18:04:11Z",
"topic": "infra.delete",
"decision": "throttle",
"http_status": 429,
"retry_after_ms": 5000,
"actor": "ops-agent",
"tenant": "prod"
}Limitations and tradeoffs
- - Strict limits protect systems and can delay legitimate urgent actions.
- - Loose burst settings improve latency and can hide runaway behavior until too late.
- - Global caps are simple and can penalize critical topics during low-value spikes.
- - Per-actor quotas improve fairness and increase policy complexity.
Next step
Run this in one sprint:
- 1. Define topic risk tiers and assign base/burst limits per tier.
- 2. Add per-actor quotas for top three high-volume agent identities.
- 3. Alert on throttle ratio and retry-after volume, not only error count.
- 4. Run one overload drill and verify throttle path prevents queue explosion.
Continue with AI Agent Timeouts, Retries, and Backoff and AI Agent Circuit Breaker Pattern.