Skip to content
Deep Dive

AI Agent Approval Policy Snapshot Drift

Approving stale context is still a production incident, even when the button click succeeds.

Deep Dive10 min readMar 2026
TL;DR
  • -Cordum blocks approval when current policy snapshot base differs from the snapshot recorded at approval creation time.
  • -Cordum also hashes the current job request and rejects approval if the hash diverges from the stored safety decision hash.
  • -Both guards currently return HTTP 409, which keeps stale approvals from silently executing changed intent.
  • -Workflow gate approvals take a different path and may use `workflow-gate` instead of strict snapshot matching.
Failure mode

A human approves based on old policy context while rules or request payload changed after the approval was queued.

Current behavior

Approve path validates policy snapshot base and job request hash before transitioning state to PENDING.

Operational payoff

Stale approvals fail fast with explicit conflict errors instead of causing hidden policy drift in production.

Scope

This guide covers stale-approval protection in `POST /api/v1/approvals/{job_id}/approve`, not generic optimistic concurrency for all APIs.

The production problem

Approval queues age. Policy evolves. Requests mutate.

If your system accepts old approvals after those changes, an operator signs one thing and executes another.

That is not a UX bug. It is a governance failure with audit fallout.

What top results cover and miss

SourceStrong coverageMissing piece
Google Secret Manager: ETags for optimistic concurrencyETag checks prevent one writer from overwriting another writer's newer intent.No approval-workflow pattern that validates both policy version and request payload before execution.
Google Cloud setIamPolicy docs (`etag` guidance)Read-modify-write with etag to avoid racing policy updates.No human-approval queue semantics where an approval can expire because the policy snapshot changed.
Twilio: Mutation and conflict resolutionMutation preconditions with ETag and If-Match to detect stale writes.No pre-dispatch governance flow that combines snapshot drift and job-hash drift checks.

Cordum runtime mechanics

BoundaryCurrent behaviorWhy it matters
Policy snapshot guardFor non-workflow-gate approvals, Cordum compares current Safety Kernel snapshot base against stored `policy_snapshot`.If policy changed, approval is rejected with `409 policy snapshot changed; re-evaluate before approving`.
Request hash guardCordum recomputes `scheduler.HashJobRequest(req)` and compares it to stored `safetyRecord.JobHash`.If request mutated, approval is rejected with `409 job request changed; approval rejected`.
Workflow gate branchWorkflow-gate approvals can set `policySnapshot = workflow-gate` and skip Safety Kernel snapshot listing.Workflow gates prioritize workflow-state checks and context over strict snapshot-base equality.
Service dependencyIf `s.safetyClient` is unavailable for non-workflow approvals, approve returns `503 safety kernel unavailable`.Availability of Safety Kernel affects approval throughput for policy approvals.

Snapshot and hash checks in code

Policy snapshot drift guard

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
policySnapshot := strings.TrimSpace(safetyRecord.PolicySnapshot)
if isWorkflowGate {
  if policySnapshot == "" {
    policySnapshot = "workflow-gate"
  }
} else {
  if policySnapshot == "" {
    result = handlerResult{http.StatusConflict, "approval policy snapshot unavailable"}
    return nil
  }
  snapResp, err := s.safetyClient.ListSnapshots(ctx, &pb.ListSnapshotsRequest{})
  if err != nil {
    result = handlerResult{http.StatusBadGateway, "list safety snapshots failed"}
    return nil
  }
  currentSnapshot := ""
  if snapResp != nil && len(snapResp.Snapshots) > 0 {
    currentSnapshot = strings.TrimSpace(snapResp.Snapshots[0])
  }
  if currentSnapshot == "" || snapshotBase(currentSnapshot) != snapshotBase(policySnapshot) {
    result = handlerResult{http.StatusConflict, "policy snapshot changed; re-evaluate before approving"}
    return nil
  }
}

Job request hash drift guard

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
hash, err := scheduler.HashJobRequest(req)
if err != nil {
  result = handlerResult{http.StatusInternalServerError, "failed to hash job request"}
  return nil
}
if hash != safetyRecord.JobHash {
  result = handlerResult{http.StatusConflict, "job request changed; approval rejected"}
  return nil
}

Overlay-tolerant snapshot base comparison

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
// Combined snapshots are "base|cfg:hash".
// snapshotBase strips config overlay hash so overlay-only changes do not invalidate approvals.
func snapshotBase(snap string) string {
  if i := strings.Index(snap, "|"); i >= 0 {
    return snap[:i]
  }
  return snap
}

Validation runbook

Run this in staging before changing approval semantics.

runbook.sh
bash
# 1) Create an approval-required job and capture /api/v1/approvals item fields:
#    policy_snapshot, job_hash, job.id
# 2) Change active policy snapshot (publish new snapshot)
# 3) Call POST /api/v1/approvals/{job_id}/approve
# 4) Verify 409 "policy snapshot changed; re-evaluate before approving"
# 5) Re-run policy evaluation to create a fresh approval item
# 6) Modify job request labels/body for the old item and retry approve
# 7) Verify 409 "job request changed; approval rejected"

Limitations and tradeoffs

ApproachUpsideDownside
Snapshot + hash checks (current)Catches both policy drift and request mutation before dispatch.More conflict paths to handle in clients and operator runbooks.
Snapshot-only checkLower compute overhead and simpler reasoning.Request payload mutations can slip through stale approvals.
Hash-only checkProtects against payload tampering after approval queueing.Policy drift can still approve actions under outdated policy assumptions.
  • - Workflow-gate approvals intentionally use a different snapshot behavior than non-workflow policy approvals.
  • - I did not find dedicated gateway tests that directly assert the exact 409 messages for snapshot/hash drift branches.
  • - Hash checks depend on stable request canonicalization; field-order surprises in custom tooling can produce false conflicts.

Next step

Do this next:

  1. 1. Add explicit tests for `policy snapshot changed` and `job request changed` conflict branches.
  2. 2. Expose machine-readable conflict codes so SDKs can route retries vs re-evaluation flows.
  3. 3. Document workflow-gate vs non-workflow-gate approval semantics in the public API docs.
  4. 4. Track drift-conflict rate as an SLI to catch policy rollout regressions early.

Continue with AI Agent Policy Decision Cache Invalidation and AI Agent Approval Lock Contention.

Approval must bind to intent

If policy or payload changes, approval context changed. Treat it as a new decision, not a retry.