Skip to content
Deep Dive

AI Agent NATS Auth Precedence

Most auth incidents are not crypto breaks. They are configuration collisions.

Deep Dive10 min readApr 2026
TL;DR
  • -Cordum picks NATS auth in strict order: user/password first, token second, nkey third.
  • -Setting multiple credential modes at once does not combine them. One wins and the rest are ignored.
  • -Username without password does not configure auth in current implementation.
  • -A simple CI guard can fail builds when more than one auth mode is set.
  • -Production should enforce a single intended auth mode and test for accidental precedence drift.
Deterministic order

Auth mode selection is deterministic, not heuristic. This is good for predictability.

Misconfig trap

Multiple auth env vars can make operators think one mode is active while another is actually used.

Test-backed

Repo tests explicitly validate priority order and edge cases like missing password.

Scope

This guide covers scheduler-side NATS client authentication selection, not broker-side ACL design.

The production problem

A team rotates auth mode from token to NKey. Old token env var stays in deployment. NKey is present but never used.

Nothing crashes. Connection still works. Security assumptions are now wrong and hard to detect from outside.

This is exactly why precedence rules must be documented, tested, and enforced by config policy.

What top results miss

SourceStrong coverageMissing piece
NATS docs: TokensServer-side token configuration and client token connection pattern.No guidance for mixed envs where token is set but shadowed by user/password in app code.
NATS docs: NKeysChallenge-response auth flow and key-handling model for NKey clients.No guidance for mixed env scenarios where NKey is configured but silently shadowed.
nats.go package docsClient options (`UserInfo`, `Token`, `NkeyOptionFromSeed`) and mutual-exclusion errors in callbacks.No production playbook for env-collision detection and CI enforcement in wrapper libraries.

Docs explain mechanisms well. They rarely explain what happens when multiple mechanisms are configured at once in a single process.

Cordum runtime behavior

BoundaryCurrent behaviorOperational impact
Selection order`natsApplyAuth` checks user/pass, then token, then nkey seed.Auth mode is deterministic and easy to reason about in code review.
User/pass requirementBoth username and password must be set, otherwise this mode is skipped.Partial credentials do not silently degrade into insecure auth.
Token fallbackToken mode activates only when user/pass mode is not selected.If both are set, token is ignored by design.
NKey fallbackNKey mode activates only when both higher-priority modes are absent.NKey can be unintentionally shadowed by leftover token or user/pass values.
Invalid NKey seedBad seed logs error and returns not-configured state for NKey path.Connection can proceed without intended auth unless other safeguards exist.
Production warningIf no auth mode is configured in production, Cordum logs a warning.Visibility exists, but warning alone does not block startup.

Env collision matrix

The easiest way to remove auth ambiguity is to make collisions impossible. This matrix shows the effective mode for common env combinations in current Cordum code.

Configured env setEffective modeRisk
NATS_USERNAME+NATS_PASSWORD onlyUser/PassExpected path when user/pass is your selected mode.
NATS_TOKEN onlyTokenSimple rollout, but token can be accidentally left behind during migrations.
NATS_NKEY onlyNKeyStrong mode, but invalid seed falls back to unauthenticated in current code path.
User/Pass + TokenUser/PassToken is ignored. Operators may believe token rotation is active when it is not.
Token + NKeyTokenNKey is shadowed by token.
User/Pass + Token + NKeyUser/PassHighest collision risk. Migration intent is often unclear in incident review.

CI guard: fail if not exactly one auth mode

ci-nats-auth-lint.sh
Bash
modes=0

if [ -n "${NATS_USERNAME:-}" ] && [ -n "${NATS_PASSWORD:-}" ]; then
  modes=$((modes+1))
fi
if [ -n "${NATS_TOKEN:-}" ]; then
  modes=$((modes+1))
fi
if [ -n "${NATS_NKEY:-}" ]; then
  modes=$((modes+1))
fi

if [ "$modes" -ne 1 ]; then
  echo "NATS auth policy violation: expected exactly one mode, got $modes"
  exit 1
fi

Code-level mechanics

Auth selection function (Go)

core/infra/bus/nats.go
Go
func natsApplyAuth(opts *[]nats.Option) bool {
  username := strings.TrimSpace(os.Getenv("NATS_USERNAME"))
  password := strings.TrimSpace(os.Getenv("NATS_PASSWORD"))
  if username != "" && password != "" {
    *opts = append(*opts, nats.UserInfo(username, password))
    return true
  }

  token := strings.TrimSpace(os.Getenv("NATS_TOKEN"))
  if token != "" {
    *opts = append(*opts, nats.Token(token))
    return true
  }

  nkey := strings.TrimSpace(os.Getenv("NATS_NKEY"))
  if nkey != "" {
    opt, err := nats.NkeyOptionFromSeed(nkey)
    if err != nil {
      slog.Error("bus: invalid NATS_NKEY seed", "err", err)
      return false
    }
    *opts = append(*opts, opt)
    return true
  }
  return false
}

Priority and edge-case tests (Go)

core/infra/bus/nats_test.go
Go
func TestNatsApplyAuth_PriorityOrder(t *testing.T) {
  // Username/password takes priority over token.
  t.Setenv("NATS_USERNAME", "alice")
  t.Setenv("NATS_PASSWORD", "secret")
  t.Setenv("NATS_TOKEN", "also-set")
  t.Setenv("NATS_NKEY", "")
  configured := natsApplyAuth(&opts)
  // expects one option: user/password
}

func TestNatsApplyAuth_UsernameWithoutPassword(t *testing.T) {
  // user without password should not configure auth
}

Operator runbook

nats_auth_precedence_runbook.sh
Bash
# 1) Print active env values
kubectl -n cordum exec deploy/cordum-scheduler -- printenv NATS_USERNAME NATS_PASSWORD NATS_TOKEN NATS_NKEY

# 2) Enforce single-mode policy
#    choose exactly one:
#    - user/pass
#    - token
#    - nkey

# 3) Remove stale vars from old mode
kubectl -n cordum set env deploy/cordum-scheduler NATS_TOKEN-
kubectl -n cordum set env deploy/cordum-scheduler NATS_NKEY-

# 4) Add CI collision guard (fail if modes != 1)
./ci-nats-auth-lint.sh

# 5) Validate precedence behavior in CI
cd D:/Cordum/cordum
go test ./core/infra/bus -run "TestNatsApplyAuth_(PriorityOrder|Token|UserInfo|UsernameWithoutPassword|NKeyInvalidSeed)"

# 6) Post-deploy check
#    confirm broker auth failures are zero and connection handshake succeeds

Limitations and tradeoffs

DecisionBenefitCost
Keep deterministic precedencePredictable behavior in code and incident timelines.Stale lower-priority secrets can remain unnoticed.
Warning-only when no auth in productionStartup stays available during temporary misconfiguration.Security posture depends on operators noticing warnings quickly.
Single-mode CI lint enforcementPrevents mixed-mode drift before deploy.Requires pipeline wiring and env template discipline.

Treat auth mode as a single source of truth. Never leave fallback credentials in environment after migration.

Next step

Do one auth posture cleanup pass:

  1. 1. Pick one intended auth mode per environment.
  2. 2. Remove all env vars for unused modes.
  3. 3. Add CI assertions for precedence-sensitive env combinations.
  4. 4. Re-run connection smoke tests after every credential rotation.

Continue with NATS TLS Enforcement and JetStream Broadcast Semantics.

Auth clarity wins

Reliability and security improve when your connection path is boring and explicit.