The production problem
Transport retries are supposed to replay the same intent.
Real clients drift. A retried request may include changed payload fields after local state updates or partial UI edits.
If server idempotency checks key existence only, one key can hide two intents and return a stale run ID.
What top results cover and miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| AWS Builders’ Library: Making retries safe with idempotent APIs | Client request IDs as intent identity and explicit handling of same key with changed intent. | No workflow-run admission example where duplicate keys immediately return existing run IDs from a Redis map. |
| Increase Docs: Idempotency keys | Clear contract: same args + same key returns replayed object, different args + same key returns `409` conflict. | No guidance for migrating an existing endpoint that already allows key reuse without argument checks. |
| AWS S3 Docs: IdempotencyParameterMismatch | Explicit typed mismatch error when idempotent request parameters diverge from prior request(s). | No control-plane rollout strategy for introducing mismatch errors without breaking legacy retry clients. |
Cordum runtime mechanics
| Boundary | Current behavior | Why it matters |
|---|---|---|
| Duplicate-key branch | If `TrySetRunIdempotencyKey` reports key already used, handler fetches existing run ID and returns it directly. | Pure retries dedupe cleanly. Changed-intent retries are not detected on this path. |
| First-run persistence | Initial request stores run input payload and idempotency key in the run document. | Payload context exists and can support mismatch validation if endpoint logic is extended. |
| Key storage | `wf:run:idempotency:<key>` maps key to `run_id` via Redis `SetNX`. | Lookup is fast and binary; it does not encode payload fingerprint semantics today. |
| Test coverage today | Current tests validate same-key concurrency collapse and reservation cleanup, not payload mismatch rejection. | Behavior changes need new tests before rollout to avoid contract drift. |
Duplicate-key path in code
Duplicate key returns existing run ID
// core/controlplane/gateway/handlers_workflows.go (excerpt)
if idempotencyKey != "" {
ok, err := s.workflowStore.TrySetRunIdempotencyKey(r.Context(), idempotencyKey, runID)
if err != nil {
writeErrorJSON(w, http.StatusInternalServerError, "idempotency reservation failed")
return
}
if !ok {
if existingID, err := s.workflowStore.GetRunByIdempotencyKey(r.Context(), idempotencyKey); err == nil && existingID != "" {
writeJSON(w, map[string]string{"run_id": existingID})
return
}
writeErrorJSON(w, http.StatusConflict, "idempotency key already used")
return
}
}First request stores payload and key
// core/controlplane/gateway/handlers_workflows.go (excerpt)
run := &wf.WorkflowRun{
ID: runID,
WorkflowID: wfID,
Input: payload,
IdempotencyKey: idempotencyKey,
Status: wf.RunStatusPending,
CreatedAt: time.Now().UTC(),
UpdatedAt: time.Now().UTC(),
}
if err := s.workflowStore.CreateRun(r.Context(), run); err != nil {
writeErrorJSON(w, http.StatusInternalServerError, "failed to create run")
return
}Current idempotency tests cover same-key replay, not mismatch
// core/controlplane/gateway/workflow_runs_test.go (excerpt)
func TestHandleStartRunIdempotencyConcurrentRequestsCreateSingleRun(t *testing.T) {
// workers=10, same Idempotency-Key, request body {}
// expected: all requests return same run_id
// expected: exactly one persisted run
}
// Note: no test currently asserts behavior for same key + different payload.Validation runbook
Run this in staging before tightening idempotency contract rules.
# 1) POST /workflows/:id/runs with Idempotency-Key: key-123 and payload {"a":1}
# 2) Capture returned run_id = R1
# 3) POST again with same key key-123 and payload {"a":2}
# 4) Observe whether API returns R1 or rejects with mismatch
# 5) Fetch run R1 and inspect stored input payload
# 6) Decide if current behavior matches your contractLimitations and tradeoffs
| Approach | Upside | Downside |
|---|---|---|
| Blind replay by key (current workflow-start behavior) | Simple and fast dedupe for transport-level retries. | Can hide accidental cross-intent key reuse. |
| Strict payload fingerprint mismatch rejection | Prevents one key from representing two intents. | Requires canonicalization rules and client update planning. |
| Scoped key schema (`workflow_id:step:intent_id`) | Reduces mismatch chance before server-side checks. | Depends on disciplined caller behavior and key lifecycle governance. |
- - Strict mismatch checks need deterministic canonicalization of JSON payloads to avoid false positives.
- - Existing clients that reuse keys loosely can break once mismatch rejection is enabled.
- - Returning old run IDs is convenient for retries and dangerous for changed intent.
Next step
Do this in one sprint:
- 1. Add a mismatch test: same key, different payload, expected contract result.
- 2. Decide contract (`409 mismatch` vs replay existing) and document it in API docs.
- 3. If rejecting mismatch, include run reference in error body for faster operator debugging.
- 4. Roll out behind a feature flag and monitor mismatch-rate metrics before forcing globally.
Continue with AI Agent Idempotency Keys and AI Agent Workflow Idempotency Reservation.