Name: Cordum
Author: Cordum

The production problem

Approval queues age. Policy evolves. Requests mutate.

If your system accepts old approvals after those changes, an operator signs one thing and executes another.

That is not a UX bug. It is a governance failure with audit fallout.

What top results cover and miss

Source	Strong coverage	Missing piece
Google Secret Manager: ETags for optimistic concurrency	ETag checks prevent one writer from overwriting another writer's newer intent.	No approval-workflow pattern that validates both policy version and request payload before execution.
Google Cloud setIamPolicy docs (`etag` guidance)	Read-modify-write with etag to avoid racing policy updates.	No human-approval queue semantics where an approval can expire because the policy snapshot changed.
Twilio: Mutation and conflict resolution	Mutation preconditions with ETag and If-Match to detect stale writes.	No pre-dispatch governance flow that combines snapshot drift and job-hash drift checks.

Cordum runtime mechanics

Boundary	Current behavior	Why it matters
Policy snapshot guard	For non-workflow-gate approvals, Cordum compares current Safety Kernel snapshot base against stored `policy_snapshot`.	If policy changed, approval is rejected with `409 policy snapshot changed; re-evaluate before approving`.
Request hash guard	Cordum recomputes `scheduler.HashJobRequest(req)` and compares it to stored `safetyRecord.JobHash`.	If request mutated, approval is rejected with `409 job request changed; approval rejected`.
Workflow gate branch	Workflow-gate approvals can set `policySnapshot = workflow-gate` and skip Safety Kernel snapshot listing.	Workflow gates prioritize workflow-state checks and context over strict snapshot-base equality.
Service dependency	If `s.safetyClient` is unavailable for non-workflow approvals, approve returns `503 safety kernel unavailable`.	Availability of Safety Kernel affects approval throughput for policy approvals.

Snapshot and hash checks in code

Policy snapshot drift guard

core/controlplane/gateway/handlers_approvals.go

// core/controlplane/gateway/handlers_approvals.go (excerpt)
policySnapshot := strings.TrimSpace(safetyRecord.PolicySnapshot)
if isWorkflowGate {
  if policySnapshot == "" {
    policySnapshot = "workflow-gate"
  }
} else {
  if policySnapshot == "" {
    result = handlerResult{http.StatusConflict, "approval policy snapshot unavailable"}
    return nil
  }
  snapResp, err := s.safetyClient.ListSnapshots(ctx, &pb.ListSnapshotsRequest{})
  if err != nil {
    result = handlerResult{http.StatusBadGateway, "list safety snapshots failed"}
    return nil
  }
  currentSnapshot := ""
  if snapResp != nil && len(snapResp.Snapshots) > 0 {
    currentSnapshot = strings.TrimSpace(snapResp.Snapshots[0])
  }
  if currentSnapshot == "" || snapshotBase(currentSnapshot) != snapshotBase(policySnapshot) {
    result = handlerResult{http.StatusConflict, "policy snapshot changed; re-evaluate before approving"}
    return nil
  }
}

Job request hash drift guard

core/controlplane/gateway/handlers_approvals.go

// core/controlplane/gateway/handlers_approvals.go (excerpt)
hash, err := scheduler.HashJobRequest(req)
if err != nil {
  result = handlerResult{http.StatusInternalServerError, "failed to hash job request"}
  return nil
}
if hash != safetyRecord.JobHash {
  result = handlerResult{http.StatusConflict, "job request changed; approval rejected"}
  return nil
}

Overlay-tolerant snapshot base comparison

core/controlplane/gateway/handlers_approvals.go

// core/controlplane/gateway/handlers_approvals.go (excerpt)
// Combined snapshots are "base|cfg:hash".
// snapshotBase strips config overlay hash so overlay-only changes do not invalidate approvals.
func snapshotBase(snap string) string {
  if i := strings.Index(snap, "|"); i >= 0 {
    return snap[:i]
  }
  return snap
}

Validation runbook

Run this in staging before changing approval semantics.

runbook.sh

bash

# 1) Create an approval-required job and capture /api/v1/approvals item fields:
#    policy_snapshot, job_hash, job.id
# 2) Change active policy snapshot (publish new snapshot)
# 3) Call POST /api/v1/approvals/{job_id}/approve
# 4) Verify 409 "policy snapshot changed; re-evaluate before approving"
# 5) Re-run policy evaluation to create a fresh approval item
# 6) Modify job request labels/body for the old item and retry approve
# 7) Verify 409 "job request changed; approval rejected"

Limitations and tradeoffs

Approach	Upside	Downside
Snapshot + hash checks (current)	Catches both policy drift and request mutation before dispatch.	More conflict paths to handle in clients and operator runbooks.
Snapshot-only check	Lower compute overhead and simpler reasoning.	Request payload mutations can slip through stale approvals.
Hash-only check	Protects against payload tampering after approval queueing.	Policy drift can still approve actions under outdated policy assumptions.

- Workflow-gate approvals intentionally use a different snapshot behavior than non-workflow policy approvals.
- I did not find dedicated gateway tests that directly assert the exact 409 messages for snapshot/hash drift branches.
- Hash checks depend on stable request canonicalization; field-order surprises in custom tooling can produce false conflicts.

Next step

Do this next:

1. Add explicit tests for `policy snapshot changed` and `job request changed` conflict branches.
2. Expose machine-readable conflict codes so SDKs can route retries vs re-evaluation flows.
3. Document workflow-gate vs non-workflow-gate approval semantics in the public API docs.
4. Track drift-conflict rate as an SLI to catch policy rollout regressions early.

Continue with AI Agent Policy Decision Cache Invalidation and AI Agent Approval Lock Contention.

AI Agent Approval Policy Snapshot Drift

The production problem

What top results cover and miss

Cordum runtime mechanics

Snapshot and hash checks in code

Policy snapshot drift guard

Job request hash drift guard

Overlay-tolerant snapshot base comparison

Validation runbook

Limitations and tradeoffs

Next step