Skip to content
Deep Dive

AI Agent NATS Auth Precedence

Most auth incidents are not crypto breaks. They are configuration collisions.

Deep Dive10 min readMar 2026
TL;DR
  • -Cordum picks NATS auth in strict order: user/password first, token second, nkey third.
  • -Setting multiple credential modes at once does not combine them. One wins and the rest are ignored.
  • -Username without password does not configure auth in current implementation.
  • -Production should enforce a single intended auth mode and test for accidental precedence drift.
Deterministic order

Auth mode selection is deterministic, not heuristic. This is good for predictability.

Misconfig trap

Multiple auth env vars can make operators think one mode is active while another is actually used.

Test-backed

Repo tests explicitly validate priority order and edge cases like missing password.

Scope

This guide covers scheduler-side NATS client authentication selection, not broker-side ACL design.

The production problem

A team rotates auth mode from token to NKey. Old token env var stays in deployment. NKey is present but never used.

Nothing crashes. Connection still works. Security assumptions are now wrong and hard to detect from outside.

This is exactly why precedence rules must be documented, tested, and enforced by config policy.

What top results miss

SourceStrong coverageMissing piece
NATS auth introductionToken, user/password, and nkey auth concepts in NATS.No application-level precedence policy when multiple credential modes are simultaneously configured.
NATS NKey auth docsChallenge-response model and key-handling benefits of NKey mode.No guidance for mixed env scenarios where NKey is configured but overshadowed by higher-priority mode.
Kafka security overviewAuthentication and encryption modes can be mixed at deployment level.No direct equivalent of single-process env precedence in lightweight broker client wrappers.

Docs explain mechanisms well. They rarely explain what happens when multiple mechanisms are configured at once in a single process.

Cordum runtime behavior

BoundaryCurrent behaviorOperational impact
Selection order`natsApplyAuth` checks user/pass, then token, then nkey seed.Auth mode is deterministic and easy to reason about in code review.
User/pass requirementBoth username and password must be set, otherwise this mode is skipped.Partial credentials do not silently degrade into insecure auth.
Token fallbackToken mode activates only when user/pass mode is not selected.If both are set, token is ignored by design.
NKey fallbackNKey mode activates only when both higher-priority modes are absent.NKey can be unintentionally shadowed by leftover token or user/pass values.
Invalid NKey seedBad seed logs error and returns not-configured state for NKey path.Connection can proceed without intended auth unless other safeguards exist.
Production warningIf no auth mode is configured in production, Cordum logs a warning.Visibility exists, but warning alone does not block startup.

Code-level mechanics

Auth selection function (Go)

core/infra/bus/nats.go
Go
func natsApplyAuth(opts *[]nats.Option) bool {
  username := strings.TrimSpace(os.Getenv("NATS_USERNAME"))
  password := strings.TrimSpace(os.Getenv("NATS_PASSWORD"))
  if username != "" && password != "" {
    *opts = append(*opts, nats.UserInfo(username, password))
    return true
  }

  token := strings.TrimSpace(os.Getenv("NATS_TOKEN"))
  if token != "" {
    *opts = append(*opts, nats.Token(token))
    return true
  }

  nkey := strings.TrimSpace(os.Getenv("NATS_NKEY"))
  if nkey != "" {
    opt, err := nats.NkeyOptionFromSeed(nkey)
    if err != nil {
      slog.Error("bus: invalid NATS_NKEY seed", "err", err)
      return false
    }
    *opts = append(*opts, opt)
    return true
  }
  return false
}

Priority and edge-case tests (Go)

core/infra/bus/nats_test.go
Go
func TestNatsApplyAuth_PriorityOrder(t *testing.T) {
  // Username/password takes priority over token.
  t.Setenv("NATS_USERNAME", "alice")
  t.Setenv("NATS_PASSWORD", "secret")
  t.Setenv("NATS_TOKEN", "also-set")
  t.Setenv("NATS_NKEY", "")
  configured := natsApplyAuth(&opts)
  // expects one option: user/password
}

func TestNatsApplyAuth_UsernameWithoutPassword(t *testing.T) {
  // user without password should not configure auth
}

Operator runbook

nats_auth_precedence_runbook.sh
Bash
# 1) Print active env values
kubectl -n cordum exec deploy/cordum-scheduler -- printenv NATS_USERNAME NATS_PASSWORD NATS_TOKEN NATS_NKEY

# 2) Enforce single-mode policy
#    choose exactly one:
#    - user/pass
#    - token
#    - nkey

# 3) Remove stale vars from old mode
kubectl -n cordum set env deploy/cordum-scheduler NATS_TOKEN-
kubectl -n cordum set env deploy/cordum-scheduler NATS_NKEY-

# 4) Validate precedence behavior in CI
cd D:/Cordum/cordum
go test ./core/infra/bus -run "TestNatsApplyAuth_PriorityOrder|TestNatsApplyAuth_UsernameWithoutPassword"

# 5) Post-deploy check
#    confirm broker auth failures are zero and connection handshake succeeds

Limitations and tradeoffs

  • - Deterministic precedence improves predictability but can hide stale lower-priority secrets.
  • - Warning-only behavior for missing auth in production favors availability over strict enforcement.
  • - NKey security benefits are lost if user/pass or token accidentally remains configured.
  • - Changing auth mode without cleanup can produce partial rollouts with inconsistent identities.

Treat auth mode as a single source of truth. Never leave fallback credentials in environment after migration.

Next step

Do one auth posture cleanup pass:

  1. 1. Pick one intended auth mode per environment.
  2. 2. Remove all env vars for unused modes.
  3. 3. Add CI assertions for precedence-sensitive env combinations.
  4. 4. Re-run connection smoke tests after every credential rotation.

Continue with NATS TLS Enforcement and JetStream Broadcast Semantics.

Auth clarity wins

Reliability and security improve when your connection path is boring and explicit.