The production problem
Most teams discover queue fairness issues only after a noisy tenant or urgent incident workload consumes all executor capacity. At that point, every queue looks unhealthy.
Pure priority scheduling fixes urgency and breaks fairness. Pure fairness protects everyone and delays critical work.
Production systems need both: priority tiers plus minimum fair-share guarantees.
What top results miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| RabbitMQ Priority Queues | Strong queue-level priority behavior and caveats around resource usage and consumer prefetch impact. | No cross-queue fairness policy for autonomous workflow control planes. |
| Kubernetes API Priority and Fairness | Fairness discipline and priority-level request handling under contention. | No agent-specific strategy for side-effect risk tiers and replay-safe scheduling. |
| Google Cloud Managed Kafka quotas | Concrete fairness controls through quotas and hard limits (for example project/regional request budgets). | No workload-tier model for autonomous agent queue starvation prevention. |
Priority fairness model
Define queue classes with explicit shares, then enforce per-tenant caps inside each class. Do not rely on ad-hoc queue order.
| Scheduling tier | Workload examples | Capacity policy | Fairness rule |
|---|---|---|---|
| P0 Critical | Incident remediation, policy rollback, production kill-switch workflows | 40% reserved + burst to 70% | Can preempt other tiers for short windows only |
| P1 Interactive | User-facing copilots and approval-required actions | 40% reserved + burst to 60% | Cannot starve P2 for more than configured window |
| P2 Batch | Backfills, analytics summarization, low urgency maintenance | 20% minimum guaranteed | Receives floor capacity even during sustained P0/P1 pressure |
| Tenant fairness | Noisy-tenant isolation inside each tier | Per-tenant max concurrency and queue depth caps | Enforce `tenant_limit` reason path before global saturation |
Cordum runtime implications
| Implication | Current behavior | Why it matters |
|---|---|---|
| Overload signals | Worker considered overloaded at >=90% parallel-job utilization or CPU/GPU >=90 | Scheduler can route or defer before queue classes collapse. |
| Fairness reason codes | Dispatch failures include `tenant_limit`, `no_workers`, and `pool_overloaded` | Operators can distinguish fairness pressure from raw infrastructure loss. |
| Retry boundaries | Max scheduling retries is 50 with 1s-30s exponential backoff | Prevents starvation loops caused by unconstrained retry churn. |
| No-capacity cooldown | `retryDelayNoWorkers` is 2s when no workers are available | Avoids hot retry loops that worsen queue contention. |
| Policy-before-dispatch | Scheduler evaluates policy before dispatch and supports approval-required branch | High-priority classes still follow governance constraints. |
Implementation examples
Tier-aware scheduler skeleton (Go)
type Tier string
const (
TierP0 Tier = "p0"
TierP1 Tier = "p1"
TierP2 Tier = "p2"
)
type QueueState struct {
Depth int
InFlight int
MaxInFlight int
}
func pickNextTier(state map[Tier]QueueState) Tier {
if state[TierP0].Depth > 0 && state[TierP0].InFlight < state[TierP0].MaxInFlight {
return TierP0
}
if state[TierP1].Depth > 0 && state[TierP1].InFlight < state[TierP1].MaxInFlight {
return TierP1
}
return TierP2
}Tier and tenant quotas (YAML)
scheduling:
tiers:
p0:
min_share: 0.40
burst_cap: 0.70
max_inflight: 200
p1:
min_share: 0.40
burst_cap: 0.60
max_inflight: 300
p2:
min_share: 0.20
burst_cap: 0.30
max_inflight: 150
tenant_limits:
max_concurrent_jobs: 50
max_queue_depth: 500Scheduling decision audit record (JSON)
{
"tenant": "acme-finance",
"tier": "p1",
"queue_depth": 742,
"inflight": 298,
"decision": "defer",
"reason_code": "tenant_limit",
"retry_delay_sec": 2,
"attempt": 7
}Limitations and tradeoffs
- - Too much P0 reserved capacity can underutilize infrastructure during normal operation.
- - Too little P2 floor creates silent starvation that looks like random latency spikes.
- - Tight tenant caps protect fairness but can frustrate bursty legitimate workloads.
- - Fair scheduling needs good queue telemetry; stale metrics degrade decisions.
Next step
Run this in one sprint:
- 1. Define 3 workload tiers and assign each current workflow to one tier.
- 2. Set minimum shares and tenant caps per tier in config.
- 3. Instrument reason-code frequency (`tenant_limit`, `pool_overloaded`, `no_workers`).
- 4. Run one controlled load test to verify P2 still makes forward progress.
Continue with AI Agent Backpressure and Queue Drain Strategy and AI Agent Rate Limiting and Overload Control.