Skip to content
Guide

AI Agent Capacity Planning Model

Size worker pools with formulas and reliability guardrails, not intuition.

Guide11 min readMar 2026
TL;DR
  • -Capacity planning for agents fails when teams size for average load and ignore policy-path degradation.
  • -Use explicit utilization targets and queueing math before touching autoscaling settings.
  • -Treat retries and replay as capacity consumers, not background noise.
  • -Headroom should be validated against incidents, not only synthetic benchmarks.
Sizing math

Estimate required workers from arrival rate, service time, and utilization target.

Headroom policy

Keep enough reserve to absorb burst traffic and policy dependency hiccups.

Operational ownership

Tie capacity decisions to measurable SLO and incident thresholds.

Scope

This guide focuses on autonomous AI agent control planes that dispatch queued work and enforce governance checks before execution.

The production problem

Capacity planning failures in agent systems rarely look like one big crash. They show up as rising dispatch latency, retry churn, and deferred work that never catches up.

Most teams still size for average traffic and rely on autoscaling to save them. That works until retries, policy outages, or long-tail job durations break the scaling signal.

You need an explicit model that links worker count to reliability metrics and recovery behavior.

What top results miss

SourceStrong coverageMissing piece
Google SRE Book: Demand forecasting and capacity planningStrong requirement for demand forecast, load testing, and provisioning ownership.No guidance for policy-gated autonomous workflows with retries and replay behavior.
Google SRE Workbook: Data processing capacity planningConcrete example: provision around 50% CPU at peak and beware runaway autoscaling.No generic model for agent pipeline stages (dispatch, policy, output checks).
AWS Well-Architected Analytics Lens BP 11.2Practical right-sizing and autoscaling guidance for predictable and spiky workloads.No reliability budgeting link between scaling behavior and autonomous side-effect safety.

Capacity model

Use queueing math as baseline, then add retry and safety-path headroom. Do not jump straight to autoscaler tuning.

Model layerFormulaTarget valueWhy it matters
Ingress ratejobs_per_second (lambda)Use p95 traffic, not daily averageAverage hides burst pressure.
Service timeavg_execution_seconds (W)Use p90 service time for conservative sizingLong-tail jobs distort capacity fast.
Worker countceil((lambda * W) / target_utilization)Target utilization 0.60-0.75Lower target gives burst headroom.
Retry overheadbase_workers * retry_multiplierStart with 1.10 to 1.30 multiplierRetry storms are real capacity demand.

Sample worker sizing outcomes:

Ingress rateService timeUtilization targetRequired workers
80 jobs/s0.35s0.6544
120 jobs/s0.40s0.6574
200 jobs/s0.55s0.70158

Cordum runtime implications

ImplicationCurrent behaviorWhy it matters
Failure-rate guardrailExisting alert threshold uses failed ratio > 10% over 5mCapacity decisions should reduce this sustained risk signal, not only reduce queue depth.
Latency guardrailDispatch p99 warning threshold is > 1sA useful early signal that worker pools are under-provisioned.
Retry budget pressureMax scheduling retries = 50, backoff 1s-30s, `retryDelayNoWorkers` = 2sRetry mechanics directly affect effective throughput and backlog shape.
Policy dependency capacity impact`POLICY_CHECK_FAIL_MODE=closed` defaults to requeue on policy outagePolicy outages can consume capacity through safe requeue loops.
Recovery debt tracking`cordum_scheduler_stale_jobs` + `cordum_scheduler_orphan_replayed_total`Capacity planning should include post-incident recovery window, not only steady-state traffic.

Implementation examples

Worker sizing helper (TypeScript)

capacity-sizing.ts
TypeScript
type SizingInput = {
  ingressPerSecond: number;    // lambda
  avgServiceSeconds: number;   // W
  targetUtilization: number;   // e.g. 0.65
  retryMultiplier?: number;    // e.g. 1.2
};

export function requiredWorkers(input: SizingInput): number {
  const base = (input.ingressPerSecond * input.avgServiceSeconds) / input.targetUtilization;
  const retries = input.retryMultiplier ?? 1.0;
  return Math.ceil(base * retries);
}

// Example:
// 120 jobs/s * 0.40s / 0.65 = 73.8 -> 74 workers
// retry multiplier 1.2 -> 89 workers

Capacity planning policy config (YAML)

capacity-plan.yaml
YAML
capacity_planning:
  target_utilization: 0.65
  retry_multiplier: 1.2
  guardrails:
    dispatch_p99_seconds_warn: 1
    failed_ratio_5m_warn: 0.10
    stale_jobs_warn: 50
  policy_dependency:
    fail_mode: closed
    max_tolerated_safety_unavailable_rate_5m: 0.05
  headroom:
    minimum_spare_workers_percent: 20
    burst_window_minutes: 15

Core capacity validation queries (PromQL)

capacity-signals.promql
PromQL
# Dispatch p99
histogram_quantile(0.99, rate(cordum_scheduler_dispatch_latency_seconds_bucket[5m]))

# Failed completion ratio
rate(cordum_jobs_completed_total{status="failed"}[5m])
/ clamp_min(rate(cordum_jobs_completed_total[5m]), 0.001)

# Safety dependency degradation
rate(cordum_safety_unavailable_total[5m])

# Recovery debt
cordum_scheduler_stale_jobs
rate(cordum_scheduler_orphan_replayed_total[5m])

Limitations and tradeoffs

  • - Simple sizing formulas assume stationarity; real workloads can shift faster than planning windows.
  • - Conservative utilization targets increase reliability but can reduce cost efficiency.
  • - Retry multipliers are rough estimates until measured under incident-like conditions.
  • - Autoscaling can still overshoot when its metric no longer tracks useful work.

Next step

Run this in one sprint:

  1. 1. Baseline p95 ingress and p90 service time for your top three topics.
  2. 2. Compute worker targets with utilization 0.65 and retry multiplier 1.2.
  3. 3. Add guardrails for dispatch p99, failed ratio, and stale jobs.
  4. 4. Validate the plan with one controlled load test and one dependency-degradation drill.

Continue with AI Agent Chaos Engineering Playbook and AI Agent Backpressure and Queue Drain Strategy.

Capacity debt becomes reliability debt

If dispatch latency and retries are rising, your architecture is already voting on your next incident.