Skip to content
Deep Dive

AI Agent Approval Idempotency

If retries are normal in your network, approval replay semantics are part of your API contract.

Deep Dive10 min readMar 2026
TL;DR
  • -Cordum approval handlers return HTTP 200 with `status: already_approved` or `status: already_rejected` for safe replay paths.
  • -The approve handler checks state plus `approval_granted=true` label before deciding replay vs conflict.
  • -Cordum sets a stable bus message key (`approval:<job_id>`) so NATS dedup can collapse repeated publish attempts.
  • -Idempotent replay is scoped: calls outside expected post-approval states still return conflict paths.
Failure mode

Operators double-click approve under latency, automation retries on timeout, and endpoints can produce ambiguous outcomes.

Current behavior

Cordum returns deterministic replay responses for already-resolved approvals instead of always returning 409.

Operational payoff

Retry clients get stable outcomes and fewer false incident alerts around manual approval workflows.

Scope

This guide covers approve/reject idempotency for `/api/v1/approvals/{job_id}` endpoints, not generic workflow run idempotency.

The production problem

Approval systems live at the boundary of humans and unreliable networks.

Humans double-submit. Browsers retry. Proxies retry. Your endpoint gets called again after the decision already happened.

If your handler only returns conflicts, clients cannot tell duplicate success from real failure.

What top results cover and miss

SourceStrong coverageMissing piece
AWS Builders' Library: Making retries safe with idempotent APIsClient request identity and server-side dedup contracts for safe retries.No human approval endpoint pattern where retries must distinguish already-approved vs still-pending states.
Stripe API docs: Idempotent requestsIdempotency key behavior for API retries and deterministic response replay expectations.No treatment of approval queues where actor state and workflow state can diverge during retries.
PayPal docs: IdempotencyRetry-safe request replay and duplicate request handling with idempotency headers.No dual-endpoint approve/reject flow where conflict and replay status need separate semantics.

Cordum runtime mechanics

BoundaryCurrent behaviorWhy it matters
Approve replay pathIf job state moved beyond APPROVAL and request labels contain `approval_granted=true`, handler returns `200 already_approved`.Safe client retries do not create approval duplicates or noisy conflict errors.
Reject replay pathIf state is DENIED, reject handler returns `200 already_rejected`.Retrying a successful reject remains deterministic for operators and bots.
Message dedup keyApprove path sets `req.Labels[cordum.bus_msg_id] = approval:<job_id>` before republishing.NATS dedup can collapse repeated publish attempts for the same approved job.
Conflict scopeIf state and labels do not match replay conditions, handler still returns conflict (`job not awaiting approval`).Idempotency does not mask true state mismatches.

Idempotency paths in code

Approve replay branch

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
if state != model.JobStateApproval {
  if state == model.JobStatePending || state == model.JobStateSucceeded ||
    state == model.JobStateScheduled || state == model.JobStateDispatched ||
    state == model.JobStateRunning {
    req, _ := s.jobStore.GetJobRequest(ctx, jobID)
    if req != nil && req.Labels != nil && req.Labels["approval_granted"] == "true" {
      rec, _ := s.jobStore.GetApprovalRecord(ctx, jobID)
      result = handlerResult{http.StatusOK, map[string]any{
        "job_id":      jobID,
        "status":      "already_approved",
        "approved_by": rec.ApprovedBy,
        "approved_at": rec.ApprovedAt,
      }}
      return nil
    }
  }
  result = handlerResult{http.StatusConflict, "job not awaiting approval"}
  return nil
}

Reject replay branch

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
if state != model.JobStateApproval {
  if state == model.JobStateDenied {
    rec, _ := s.jobStore.GetApprovalRecord(ctx, jobID)
    result = handlerResult{http.StatusOK, map[string]any{
      "job_id":      jobID,
      "status":      "already_rejected",
      "rejected_by": rec.ApprovedBy,
      "rejected_at": rec.ApprovedAt,
    }}
    return nil
  }
  result = handlerResult{http.StatusConflict, "job not awaiting approval"}
  return nil
}

Dedup key for publish retries

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
// Stable idempotency key per job so NATS dedup works on retries.
req.Labels[bus.LabelBusMsgID] = "approval:" + jobID
if err := s.jobStore.SetJobRequest(ctx, req); err != nil {
  if strings.Contains(err.Error(), "transaction failed") {
    result = handlerResult{http.StatusConflict, "concurrent approval conflict; retry"}
    return nil
  }
}

Existing idempotency tests

core/controlplane/gateway/handlers_approvals_test.go
go
// core/controlplane/gateway/handlers_approvals_test.go (excerpt)
func TestApproveJobIdempotent(t *testing.T) {
  // first approval returns 200
  // second approval returns 200 with status=already_approved
}

func TestRejectJobIdempotent(t *testing.T) {
  // first rejection returns 200
  // second rejection returns 200 with status=already_rejected
}

Validation runbook

Validate this on staging before changing approval-client retry behavior.

runbook.sh
bash
# 1) Create approval-required job_id J
# 2) POST /api/v1/approvals/J/approve (expect 200)
# 3) Retry same approve call 5 times in parallel
# 4) Verify all retries return 200 and include status=already_approved
# 5) Repeat flow with reject path (expect already_rejected)
# 6) Inspect bus dedup label in stored request: cordum.bus_msg_id=approval:J

Limitations and tradeoffs

ApproachUpsideDownside
Always return 409 after first approvalSimple state machine exposure to clients.Retries become noisy and require extra client-side interpretation.
Idempotent replay responses (current)Deterministic outcomes for safe retries and better operator UX.Requires stricter replay condition checks to avoid false positives.
Replay everything without state checksLowest client complexity.Can hide real conflicts and weaken audit confidence.
  • - Replay logic depends on specific state and label conditions, so custom integrations must preserve label integrity.
  • - Replay semantics do not replace conflict handling for genuine concurrent state transitions.
  • - I found idempotency tests for success replay paths, but not exhaustive tests for every conflict branch under high concurrency.

Next step

Implement this next:

  1. 1. Document replay contracts explicitly in API docs (`already_approved`, `already_rejected`).
  2. 2. Add machine-readable error/replay codes for SDK-level retry routing.
  3. 3. Add concurrency tests that mix duplicate retries with true state conflicts.
  4. 4. Track replay-rate vs conflict-rate per endpoint to catch client retry regressions.

Continue with AI Agent NATS Msg-Id Strategy and AI Agent Approval Lock Contention.

Retries are product behavior

If a second click returns a different truth than the first click, your approval API is unstable under normal network conditions.