Name: Cordum
Author: Cordum

The production problem

Most teams discover queue fairness issues only after a noisy tenant or urgent incident workload consumes all executor capacity. At that point, every queue looks unhealthy.

Pure priority scheduling fixes urgency and breaks fairness. Pure fairness protects everyone and delays critical work.

Production systems need both: priority tiers plus minimum fair-share guarantees.

What top results miss

Source	Strong coverage	Missing piece
RabbitMQ Priority Queues	Strong queue-level priority behavior and caveats around resource usage and consumer prefetch impact.	No cross-queue fairness policy for autonomous workflow control planes.
Kubernetes API Priority and Fairness	Fairness discipline and priority-level request handling under contention.	No agent-specific strategy for side-effect risk tiers and replay-safe scheduling.
Google Cloud Managed Kafka quotas	Concrete fairness controls through quotas and hard limits (for example project/regional request budgets).	No workload-tier model for autonomous agent queue starvation prevention.

Priority fairness model

Define queue classes with explicit shares, then enforce per-tenant caps inside each class. Do not rely on ad-hoc queue order.

Scheduling tier	Workload examples	Capacity policy	Fairness rule
P0 Critical	Incident remediation, policy rollback, production kill-switch workflows	40% reserved + burst to 70%	Can preempt other tiers for short windows only
P1 Interactive	User-facing copilots and approval-required actions	40% reserved + burst to 60%	Cannot starve P2 for more than configured window
P2 Batch	Backfills, analytics summarization, low urgency maintenance	20% minimum guaranteed	Receives floor capacity even during sustained P0/P1 pressure
Tenant fairness	Noisy-tenant isolation inside each tier	Per-tenant max concurrency and queue depth caps	Enforce `tenant_limit` reason path before global saturation

Cordum runtime implications

Implication	Current behavior	Why it matters
Overload signals	Worker considered overloaded at >=90% parallel-job utilization or CPU/GPU >=90	Scheduler can route or defer before queue classes collapse.
Fairness reason codes	Dispatch failures include `tenant_limit`, `no_workers`, and `pool_overloaded`	Operators can distinguish fairness pressure from raw infrastructure loss.
Retry boundaries	Max scheduling retries is 50 with 1s-30s exponential backoff	Prevents starvation loops caused by unconstrained retry churn.
No-capacity cooldown	`retryDelayNoWorkers` is 2s when no workers are available	Avoids hot retry loops that worsen queue contention.
Policy-before-dispatch	Scheduler evaluates policy before dispatch and supports approval-required branch	High-priority classes still follow governance constraints.

Implementation examples

Tier-aware scheduler skeleton (Go)

fair-scheduler.go

type Tier string

const (
  TierP0 Tier = "p0"
  TierP1 Tier = "p1"
  TierP2 Tier = "p2"
)

type QueueState struct {
  Depth       int
  InFlight    int
  MaxInFlight int
}

func pickNextTier(state map[Tier]QueueState) Tier {
  if state[TierP0].Depth > 0 && state[TierP0].InFlight < state[TierP0].MaxInFlight {
    return TierP0
  }

  if state[TierP1].Depth > 0 && state[TierP1].InFlight < state[TierP1].MaxInFlight {
    return TierP1
  }

  return TierP2
}

Tier and tenant quotas (YAML)

priority-quotas.yaml

YAML

scheduling:
  tiers:
    p0:
      min_share: 0.40
      burst_cap: 0.70
      max_inflight: 200
    p1:
      min_share: 0.40
      burst_cap: 0.60
      max_inflight: 300
    p2:
      min_share: 0.20
      burst_cap: 0.30
      max_inflight: 150
  tenant_limits:
    max_concurrent_jobs: 50
    max_queue_depth: 500

Scheduling decision audit record (JSON)

scheduling-audit.json

JSON

{
  "tenant": "acme-finance",
  "tier": "p1",
  "queue_depth": 742,
  "inflight": 298,
  "decision": "defer",
  "reason_code": "tenant_limit",
  "retry_delay_sec": 2,
  "attempt": 7
}

Limitations and tradeoffs

- Too much P0 reserved capacity can underutilize infrastructure during normal operation.
- Too little P2 floor creates silent starvation that looks like random latency spikes.
- Tight tenant caps protect fairness but can frustrate bursty legitimate workloads.
- Fair scheduling needs good queue telemetry; stale metrics degrade decisions.

Next step

Run this in one sprint:

1. Define 3 workload tiers and assign each current workflow to one tier.
2. Set minimum shares and tenant caps per tier in config.
3. Instrument reason-code frequency (`tenant_limit`, `pool_overloaded`, `no_workers`).
4. Run one controlled load test to verify P2 still makes forward progress.

Continue with AI Agent Backpressure and Queue Drain Strategy and AI Agent Rate Limiting and Overload Control.

AI Agent Priority Queues and Fair Scheduling