Name: Cordum
Author: Cordum

The production problem

Teams hear “exactly once” and assume duplicates disappear. Then a retry path or region boundary appears, and the same action runs twice.

In agent systems, duplicate side effects are expensive. One replay can open duplicate incidents, create duplicate PRs, or push duplicate infrastructure changes.

A small duplicate rate still hurts at scale. At 20,000 side-effecting operations per day, 0.5% duplicate execution means roughly 100 duplicate actions every day.

What top results miss

Source	Strong coverage	Missing piece
Google Pub/Sub exactly-once delivery	Clear exactly-once semantics and constraints (pull subscriptions, regional scope).	No guidance for cross-system side effects in autonomous workflows.
Amazon SQS at-least-once delivery	Direct explanation that duplicates can occur and consumers must be idempotent.	No framework for policy-gated replay in agent control planes.
Kafka delivery semantics (Confluent docs)	Strong description of at-most-once, at-least-once, and exactly-once tradeoffs.	Does not cover external side-effect systems outside topic-to-topic transactions.

Delivery semantics model

Layer	Required rule	Failure if missing
Transport delivery	Assume at-least-once unless your exact platform mode says otherwise.	Duplicate message handling not implemented.
Processing semantics	Store idempotency keys and processed markers at operation boundaries.	Repeated execution of state-changing actions.
External side effects	Use per-action dedupe in destination system when possible.	Duplicate tickets, PRs, payments, or infrastructure changes.
Governance layer	Run policy checks on replay path too, not just first execution.	Unsafe retries bypass runtime controls.

Cordum runtime implications

Implication	Current behavior	Why it matters
Message bus behavior	JetStream durable subjects with explicit ack/nak and a default 10m AckWait	Redelivery is expected when a handler fails or misses the ack deadline.
Scheduler retry budget	Up to 50 scheduling attempts with 1s-30s backoff before terminal DLQ	Retries are finite, so replay and dedupe need explicit operator workflows.
Run idempotency	Run creation supports `Idempotency-Key` header	Duplicate submission requests map to one logical run.
DLQ and replay	Terminal failures emit DLQ entries for controlled retry/replay	Acknowledges that failures and duplicates are operationally normal.
Policy checks	Submit-time and dispatch-time safety evaluation	Replay path still passes governance controls before side effects.

Implementation examples

Idempotent consumer skeleton (Go)

consumer.go

func HandleMessage(msg Message) error {
  if alreadyProcessed(msg.IdempotencyKey) {
    return nil
  }

  if err := applySideEffect(msg.Payload); err != nil {
    return err
  }

  markProcessed(msg.IdempotencyKey)
  return nil
}

Delivery policy defaults (YAML)

delivery.yaml

YAML

delivery_semantics:
  default: at_least_once
  replay:
    require_policy_check: true
    require_idempotency_key: true
    deny_if_missing_key: true

Duplicate handling audit event (JSON)

dedupe-event.json

JSON

{
  "message_id": "msg_9041",
  "idempotency_key": "run_18:step_5:create_pr",
  "delivery_attempt": 3,
  "duplicate_detected": true,
  "side_effect_executed": false,
  "decision": "dedupe_skip"
}

Limitations and tradeoffs

- Idempotency ledgers add state and storage overhead to consumers.
- Strict dedupe windows can reject legitimate replays after long outages.
- Exactly-once modes can reduce duplicate work and increase latency/cost in some paths.
- Cross-system side effects still need compensation for non-idempotent destinations.

Next step

Run this in one sprint:

1. Classify each workflow edge as at-most-once, at-least-once, or scoped exactly-once.
2. Add idempotency keys to every side-effecting operation path.
3. Track duplicate-detected rate and replay success in dashboards.
4. Test one forced redelivery scenario per critical workflow.

Continue with AI Agent Idempotency Keys and AI Agent Transactional Outbox Pattern.

AI Agent Exactly-Once Is Mostly a Myth