Name: Cordum
Author: Cordum

The production problem

Teams ask for deterministic worker routing.

Operations teams ask for overload protection and graceful fallback.

If you make hints mandatory, you get hot spots and brittle routing.

If you ignore hints completely, you lose locality and warm-cache gains.

What top results cover and miss

Source	Strong coverage	Missing piece
Kubernetes Node Affinity	Hard vs soft scheduling preferences (`required` vs `preferred`) with explicit placement constraints.	No worker-level direct-subject routing hints inside an application scheduler.
gRPC Custom Load Balancing Policies	Policy-based balancing and client-side route selection behavior.	No policy for mixed hard pool constraints plus soft worker hints in queue dispatch workflows.
AWS ALB Sticky Sessions	Session affinity behavior and when stickiness improves UX continuity.	No guardrail model for rejecting sticky preference when target is overloaded or ineligible.

Cordum runtime mechanics

Boundary	Current behavior	Why it matters
Pool hint strictness	`preferred_pool` narrows topic pools, but fails if that pool is not mapped for the topic.	Prevents silently routing outside declared topic-to-pool contract.
Worker hint softness	`preferred_worker_id` is honored only if worker exists, belongs to eligible pool, matches placement labels, and is not overloaded.	Avoids mandatory pinning into unhealthy capacity.
Fallback route	If preferred worker is unsuitable, strategy falls back to least-loaded scoring across eligible workers.	Keeps dispatch progress without manual hint cleanup.
Overload guard	Worker is overloaded when utilization >= 0.9, or CPU/GPU utilization >= 90.	Hinted routing respects capacity safety limits.
Placement label scope	Only prefixed placement labels constrain worker matching; business labels are ignored.	Prevents accidental routing lock-in from application metadata.

Strategy code paths

Strict pool hint, soft worker hint

core/controlplane/scheduler/strategy_least_loaded.go

// core/controlplane/scheduler/strategy_least_loaded.go (excerpt)
poolHint := labels["preferred_pool"]
if poolHint != "" {
  if !containsPool(topicPools, poolHint) {
    return "", fmt.Errorf("%w: preferred pool %q not mapped for topic %q", ErrNoPoolMapping, poolHint, req.Topic)
  }
  topicPools = []string{poolHint}
}

if preferredWorker := labels["preferred_worker_id"]; preferredWorker != "" {
  if hb, exists := workers[preferredWorker]; exists {
    if _, ok := poolSet[hb.GetPool()]; ok && matchesLabels(hb, requiredLabels) && !isOverloaded(hb) {
      return bus.DirectSubject(preferredWorker), nil
    }
  }
}
// else fallback to least-loaded selection

Overload guardrails

core/controlplane/scheduler/strategy_least_loaded.go

// core/controlplane/scheduler/strategy_least_loaded.go (excerpt)
const overloadUtilizationThreshold = 0.9

func isOverloaded(hb *pb.Heartbeat) bool {
  if capacity := hb.GetMaxParallelJobs(); capacity > 0 {
    utilization := float32(hb.GetActiveJobs()) / float32(capacity)
    if utilization >= overloadUtilizationThreshold { return true }
  }
  if hb.GetCpuLoad() >= 90 { return true }
  if hb.GetGpuUtilization() >= 90 { return true }
  return false
}

Placement label scoping + tests

core/controlplane/scheduler/strategy_least_loaded.go + _test.go

// core/controlplane/scheduler/strategy_least_loaded.go (excerpt)
func filterPlacementLabels(labels map[string]string) map[string]string {
  for k, v := range labels {
    if strings.HasPrefix(k, "placement.") ||
       strings.HasPrefix(k, "constraint.") ||
       strings.HasPrefix(k, "node.") {
      out[k] = v
    }
  }
  return out
}

// core/controlplane/scheduler/strategy_least_loaded_test.go (excerpt)
func TestLeastLoadedStrategyHonorsPreferredWorker(t *testing.T) {
  req := &pb.JobRequest{
    Topic: "job.default",
    Labels: map[string]string{"preferred_worker_id": "w2"},
  }
  subject, _ := strategy.PickSubject(req, workers)
  if subject != "worker.w2.jobs" { t.Fatalf("expected preferred worker") }
}

func TestFilterPlacementLabels(t *testing.T) {
  // placement.* / constraint.* / node.* kept; business labels ignored
}

Validation runbook

Validate hint behavior explicitly. Do not assume worker hints are strict pins.

preferred-worker-hint-runbook.sh

bash

# 1) Validate strategy tests
go test ./core/controlplane/scheduler -run TestLeastLoadedStrategyHonorsPreferredWorker -count=1
go test ./core/controlplane/scheduler -run TestFilterPlacementLabels -count=1

# 2) Submit job with preferred worker hint
cordumctl job submit --topic job.default --prompt "hint probe" --labels '{"preferred_worker_id":"w2"}'

# 3) Submit job with strict preferred pool hint
cordumctl job submit --topic job.default --prompt "pool hint probe" --labels '{"preferred_pool":"gpu-batch"}'

# 4) Inspect scheduler logs for hint decisions
rg "strategy pick preferred worker|no pool mapping for topic" /var/log/cordum/scheduler.log

Limitations and tradeoffs

Approach	Upside	Downside
Soft worker hint + strict pool hint (current)	Good balance between determinism and safety.	Behavior can surprise teams expecting hard worker pinning.
Hard worker pinning	Maximum predictability for targeted workloads.	Higher risk of overload/staleness hotspots and manual operations burden.
Ignore all hints	Simplest scheduler behavior.	Loses useful locality and warm-cache optimization opportunities.

- Soft hints are safer by default, but teams need clear docs to avoid wrong assumptions.
- Strict pool hints can be useful for compliance boundaries, but misconfiguration risk is higher.
- Capacity and staleness checks must stay in front of hint shortcuts.

Next step

Implement this next:

1. Add explicit docs table: which hints are hard constraints vs soft preferences.
2. Add a test for preferred worker fallback when hinted worker exists but is overloaded.
3. Emit metrics for hint usage and hint rejection causes (`overloaded`, `label_mismatch`, `pool_ineligible`).
4. Add a dry-run endpoint that returns selected worker and rejection rationale for hints.

Continue with AI Agent Stale Worker Dispatch Retries and AI Agent Priority Fair Scheduling.

AI Agent Preferred Worker Routing: Hint, Not Mandate