Name: Cordum
Author: Cordum

The production problem

Transport retries are supposed to replay the same intent.

Real clients drift. A retried request may include changed payload fields after local state updates or partial UI edits.

If server idempotency checks key existence only, one key can hide two intents and return a stale run ID.

What top results cover and miss

Source	Strong coverage	Missing piece
AWS Builders’ Library: Making retries safe with idempotent APIs	Client request IDs as intent identity and explicit handling of same key with changed intent.	No workflow-run admission example where duplicate keys immediately return existing run IDs from a Redis map.
Increase Docs: Idempotency keys	Clear contract: same args + same key returns replayed object, different args + same key returns `409` conflict.	No guidance for migrating an existing endpoint that already allows key reuse without argument checks.
AWS S3 Docs: IdempotencyParameterMismatch	Explicit typed mismatch error when idempotent request parameters diverge from prior request(s).	No control-plane rollout strategy for introducing mismatch errors without breaking legacy retry clients.

Cordum runtime mechanics

Boundary	Current behavior	Why it matters
Duplicate-key branch	If `TrySetRunIdempotencyKey` reports key already used, handler fetches existing run ID and returns it directly.	Pure retries dedupe cleanly. Changed-intent retries are not detected on this path.
First-run persistence	Initial request stores run input payload and idempotency key in the run document.	Payload context exists and can support mismatch validation if endpoint logic is extended.
Key storage	`wf:run:idempotency:<key>` maps key to `run_id` via Redis `SetNX`.	Lookup is fast and binary; it does not encode payload fingerprint semantics today.
Test coverage today	Current tests validate same-key concurrency collapse and reservation cleanup, not payload mismatch rejection.	Behavior changes need new tests before rollout to avoid contract drift.

Duplicate-key path in code

Duplicate key returns existing run ID

core/controlplane/gateway/handlers_workflows.go

// core/controlplane/gateway/handlers_workflows.go (excerpt)
if idempotencyKey != "" {
  ok, err := s.workflowStore.TrySetRunIdempotencyKey(r.Context(), idempotencyKey, runID)
  if err != nil {
    writeErrorJSON(w, http.StatusInternalServerError, "idempotency reservation failed")
    return
  }
  if !ok {
    if existingID, err := s.workflowStore.GetRunByIdempotencyKey(r.Context(), idempotencyKey); err == nil && existingID != "" {
      writeJSON(w, map[string]string{"run_id": existingID})
      return
    }
    writeErrorJSON(w, http.StatusConflict, "idempotency key already used")
    return
  }
}

First request stores payload and key

core/controlplane/gateway/handlers_workflows.go

// core/controlplane/gateway/handlers_workflows.go (excerpt)
run := &wf.WorkflowRun{
  ID:             runID,
  WorkflowID:     wfID,
  Input:          payload,
  IdempotencyKey: idempotencyKey,
  Status:         wf.RunStatusPending,
  CreatedAt:      time.Now().UTC(),
  UpdatedAt:      time.Now().UTC(),
}

if err := s.workflowStore.CreateRun(r.Context(), run); err != nil {
  writeErrorJSON(w, http.StatusInternalServerError, "failed to create run")
  return
}

Current idempotency tests cover same-key replay, not mismatch

core/controlplane/gateway/workflow_runs_test.go

// core/controlplane/gateway/workflow_runs_test.go (excerpt)
func TestHandleStartRunIdempotencyConcurrentRequestsCreateSingleRun(t *testing.T) {
  // workers=10, same Idempotency-Key, request body {}
  // expected: all requests return same run_id
  // expected: exactly one persisted run
}

// Note: no test currently asserts behavior for same key + different payload.

Validation runbook

Run this in staging before tightening idempotency contract rules.

runbook.sh

bash

# 1) POST /workflows/:id/runs with Idempotency-Key: key-123 and payload {"a":1}
# 2) Capture returned run_id = R1
# 3) POST again with same key key-123 and payload {"a":2}
# 4) Observe whether API returns R1 or rejects with mismatch
# 5) Fetch run R1 and inspect stored input payload
# 6) Decide if current behavior matches your contract

Limitations and tradeoffs

Approach	Upside	Downside
Blind replay by key (current workflow-start behavior)	Simple and fast dedupe for transport-level retries.	Can hide accidental cross-intent key reuse.
Strict payload fingerprint mismatch rejection	Prevents one key from representing two intents.	Requires canonicalization rules and client update planning.
Scoped key schema (`workflow_id:step:intent_id`)	Reduces mismatch chance before server-side checks.	Depends on disciplined caller behavior and key lifecycle governance.

- Strict mismatch checks need deterministic canonicalization of JSON payloads to avoid false positives.
- Existing clients that reuse keys loosely can break once mismatch rejection is enabled.
- Returning old run IDs is convenient for retries and dangerous for changed intent.

Next step

Do this in one sprint:

1. Add a mismatch test: same key, different payload, expected contract result.
2. Decide contract (`409 mismatch` vs replay existing) and document it in API docs.
3. If rejecting mismatch, include run reference in error body for faster operator debugging.
4. Roll out behind a feature flag and monitor mismatch-rate metrics before forcing globally.

Continue with AI Agent Idempotency Keys and AI Agent Workflow Idempotency Reservation.

AI Agent Idempotency Payload Mismatch