Name: Cordum
Author: Cordum

The production problem

A timeout at the client does not mean the server did nothing. Autonomous agents hit that ambiguity many times per hour.

Without idempotency keys, one ambiguous timeout can create duplicate tickets, duplicate pull requests, or duplicate infrastructure changes.

What top results miss

Source	Strong coverage	Missing piece
Stripe idempotent requests	Excellent concrete behavior: key reuse returns first response, parameter mismatch is rejected, and key retention guidance.	Not focused on multi-step autonomous workflows with delegated agent actions.
AWS Builders Library: making retries safe with idempotent APIs	Strong API contract design and client request identifiers for at-most-once intent.	No opinionated governance model for policy-gated replays in agent control planes.
AWS Lambda durable execution idempotency	Execution-name idempotency and step replay behavior in durable executions.	Limited cross-system guidance when one workflow calls multiple external tools.

Key design model

Layer	Required design	Failure if missing
Key format	Derive from business intent (`run_id:step_id:operation_id`), not random per retry.	Each retry looks new and duplicates side effects.
Parameter lock	Store request hash with key and reject mismatched replays.	Same key with changed payload mutates unexpected state.
Retention window	Keep key state long enough to absorb late retries and replay lag.	Expired keys allow accidental re-execution of old intents.
Outcome semantics	Return semantically equivalent result for duplicate requests.	Client behavior diverges between first call and retry path.

Cordum runtime behavior

Control	Current behavior	Why it matters
Run idempotency	Workflow run creation supports `Idempotency-Key` header	Prevents duplicate run creation when submit responses are lost.
State mapping	Workflow idempotency mapping in `wf:run:idempotency:<key>`	Provides deterministic lookup for duplicate submit attempts.
Handler safety	Bus handlers are idempotent via Redis locks + retryable NAKs	Protects against duplicate message handling in at-least-once delivery.
Compensation path	Saga compensation keys are auto-generated from workflow/job/topic/capability metadata	Keeps rollback actions replay-safe under repeated failure conditions.

Implementation examples

Stable key generation (Go)

idempotency.go

func BuildIdempotencyKey(runID, stepID, opID string) string {
  // Stable across retries for the same business intent
  return fmt.Sprintf("%s:%s:%s", runID, stepID, opID)
}

func ReplaySafeCall(ctx context.Context, req Request) (Response, error) {
  key := BuildIdempotencyKey(req.RunID, req.StepID, req.OperationID)
  return client.Do(req.WithHeader("Idempotency-Key", key))
}

Idempotency storage policy (YAML)

idempotency.yaml

YAML

idempotency:
  retention_window: 24h
  validate_parameter_hash: true
  reject_mismatch: true
  response_mode: semantic_equivalence
  key_template: "{run_id}:{step_id}:{operation_id}"

Ledger entry for retries (JSON)

idempotency-ledger.json

JSON

{
  "idempotency_key": "run_93a:step_4:create_pr",
  "request_hash": "sha256:cb81...",
  "first_seen_at": "2026-03-31T16:20:44Z",
  "result_ptr": "res:job_771",
  "replay_count": 3,
  "last_replay_at": "2026-03-31T16:22:09Z"
}

Limitations and tradeoffs

- Longer retention windows improve replay safety but increase storage cost and lookup cardinality.
- Strict parameter mismatch checks prevent accidental misuse and can block valid intent changes.
- Generated random keys are easy and break semantic dedupe across services.
- Idempotency does not replace compensation. It prevents duplicate intent, not wrong intent.

Next step

Run this in one sprint:

1. Define key templates for your top five side-effecting operations.
2. Add parameter hash validation to replay paths.
3. Instrument key hit ratio and mismatch rate in dashboards.
4. Run one fault drill with forced timeout + retry to verify no duplicate side effect.

Continue with AI Agent Timeouts, Retries, and Backoff and AI Agent DLQ and Replay Patterns.

AI Agent Idempotency Keys for Production Workflows