Skip to content
Deep Dive

AI Agent NATS Msg-Id Strategy

JetStream dedup is real, useful, and time-bounded. Your control plane still needs longer idempotency logic.

Deep Dive11 min readMar 2026
TL;DR
  • -JetStream duplicate suppression is time-bounded by a duplicate window, not infinite by default.
  • -Cordum sets stream `Duplicates` to 2 minutes and computes `Nats-Msg-Id` from packet payload fields.
  • -Cordum approval retries use a stable override key (`cordum.bus_msg_id`) so republish attempts dedup correctly.
  • -Long-horizon API idempotency is a separate Redis layer with a 90-day default TTL.
Hidden boundary

A publish retry after 2 minutes can become a new message unless your app has its own idempotency layer.

Code-level rule

Cordum computes message IDs from typed payloads and allows explicit override for controlled resubmits.

Operator impact

You can separate short broker dedup from long business idempotency without pretending exactly-once exists.

Scope

This guide focuses on publish-time dedup keys and idempotency layers for control-plane job traffic. It does not try to cover every possible stream retention topology.

The production problem

Your publisher sends a job. Connection drops before it sees an ack. It retries.

If the retry lands inside the broker dedup window with the same `Nats-Msg-Id`, you are safe.

If the retry lands after the window, the broker may accept it as new. If your operation is expensive or side-effectful, that hurts.

What top results cover and miss

SourceStrong coverageMissing piece
NATS docs: JetStream StreamsDuplicate window configuration (`Duplicates`) and stream-level dedup behavior.No application-level key design guidance for approval retries, resubmits, and tenant-scoped operations.
NATS docs: JetStream Model Deep Dive`Nats-Msg-Id` semantics and duplicate suppression model.No control-plane pattern for combining broker dedup with longer idempotency horizons.
NATS Blog: Infinite message deduplication in JetStreamHow to push dedup beyond window constraints with `DiscardNewPerSubject` patterns.No concrete mapping for multi-step AI workflow operations where retries and approvals mutate request labels.

Cordum runtime mechanics

Cordum treats broker dedup and business idempotency as separate layers. The split is deliberate.

BoundaryCurrent behaviorOperational impact
Broker dedup windowCordum configures JetStream streams with `Duplicates: 2 * time.Minute`.Republish retries inside 2 minutes can dedup; after that, they may be accepted as new.
Publish-time key`computeMsgID` derives keys from typed packet fields and supports `cordum.bus_msg_id` override.Stable keys are possible when caller needs deterministic dedup for retries.
Approval requeueGateway approval path sets `req.Labels[cordum.bus_msg_id] = "approval:<job_id>"`.Repeated approval publish attempts do not spray duplicate jobs in the dedup window.
Business idempotencyRedis job-store idempotency defaults to `90 * 24h`.Late retries and client reconnect loops are still guarded after broker dedup has expired.
JetStream duplicate window in Cordum
go
// core/infra/bus/nats.go (excerpt)
_, err := js.AddStream(&nats.StreamConfig{
  Name:       name,
  Subjects:   subjects,
  Retention:  nats.LimitsPolicy,
  Storage:    nats.FileStorage,
  MaxAge:     maxAge,
  Replicas:   replicas,
  Duplicates: 2 * time.Minute,
})
Msg-Id calculation with override label
go
// core/infra/bus/nats.go (excerpt)
const LabelBusMsgID = "cordum.bus_msg_id"

func computeMsgID(subject string, packet *pb.BusPacket) string {
  switch payload := packet.Payload.(type) {
  case *pb.BusPacket_JobRequest:
    if payload.JobRequest != nil {
      if override := strings.TrimSpace(payload.JobRequest.Labels[LabelBusMsgID]); override != "" {
        return "jobreq:" + subject + ":" + override
      }
      return "jobreq:" + strings.TrimSpace(payload.JobRequest.JobId)
    }
  }
  return ""
}
Approval retry key stabilization
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
// Stable idempotency key per job so NATS dedup works on retries.
req.Labels[bus.LabelBusMsgID] = "approval:" + jobID
Long-horizon idempotency TTL
go
// core/infra/store/job_store.go (excerpt)
// Idempotency keys must outlive the job lifecycle to prevent duplicate jobs.
idempotencyTTL := 90 * 24 * time.Hour
if v := os.Getenv("CORDUM_IDEMPOTENCY_TTL"); v != "" {
  if parsed, err := time.ParseDuration(v); err == nil && parsed > 0 {
    idempotencyTTL = parsed
  }
}

Msg-Id design rules

Rule 1: Base Msg-Id on the semantic operation, not transport metadata.

Rule 2: For controlled replays, use an explicit override key so operators can force "same operation" versus "new operation" behavior.

Rule 3: Keep broker dedup short and predictable. Put long retry protection in your state store where you control TTL and key scope.

Rule 4: Test the boundary. Most duplicate incidents happen at time-window edges, not in the happy path.

Validation runbook

Run this in staging before rollout. If you never test the +121s retry case, you will eventually learn about it in production.

Dedup boundary validation
bash
# 1) Publish a job packet with Nats-Msg-Id = jobreq:job-123
# 2) Republish same packet at +30s (expect dedup)
# 3) Republish same packet at +150s (outside 2m window; expect possible new accept)
# 4) Submit same API request with same Idempotency-Key at +1h (expect existing job id)
# 5) Force approval endpoint retry and verify stable cordum.bus_msg_id behavior

Limitations and tradeoffs

ApproachUpsideDownside
Rely only on JetStream duplicate windowSimple setup and low app complexity.Late retries outside the window can create duplicate business operations.
Broker dedup + app idempotency (Cordum pattern)Good protection for both short reconnect storms and long retry tails.Two layers to reason about and monitor.
Infinite per-subject dedup at brokerStronger duplicate suppression in stream state.More stream design constraints and operational complexity for evolving workflows.

Next step

Add an automated retry-boundary test suite that replays publish attempts at +30s, +119s, +121s, and +1h with fixed Msg-Id and fixed API idempotency key, then assert exactly which layer blocks each duplicate.

Related Articles

View all posts

Need production-safe agent governance?

Cordum helps teams enforce pre-dispatch policy, run dependable agent workflows, and keep evidence trails auditable.