Skip to content
Guide

AI Agent Policy Simulation Before Production Dispatch

Policy-as-code needs tests, not optimism.

Guide10 min readApr 2026
TL;DR
  • -Policy changes need test suites. Reviewing YAML by eye is not enough.
  • -Cordum has two simulation endpoints with different fidelity; use each intentionally.
  • -Live simulation runs velocity checks in dry-run mode, so counters are read but not advanced.
  • -Test failures should block policy publish the same way failing unit tests block app deploy.
No-side-effect checks

Simulate decisions without dispatching real actions

CI enforcement

Treat policy changes like production code

Decision evidence

Capture matched rule IDs and constraints per test

Scope

This guide covers simulation of governance policies for autonomous agent actions before those actions are persisted or dispatched.

The production problem

Teams often ship policy changes with the same rigor as a Slack message. Then a deny turns into allow in production and nobody can explain why.

Autonomous agents increase that risk because policies gate real actions. A bad rule is not just a bad dashboard number. It can trigger wrong side effects.

Another trap is simulation parity. Teams assume one simulate endpoint represents all runtime behavior. In practice, endpoint choice changes what code paths are exercised.

What top results miss

SourceStrong coverageMissing piece
AWS IAM policy simulatorSolid simulation workflow and clear caveats versus live environment behavior.Not designed for multi-step autonomous agent action policies.
OPA policy testingStrong unit testing model for policy rules with `opa test` and CI friendliness.Does not cover runtime parity when policy checks include stateful controls like velocity windows.
OpenFGA testing modelsPractical authorization model test format with check/list assertions.Focuses on authorization semantics, not pre-dispatch risk controls for agents.

Simulation model

StageRequired designFailure if missing
Scenario catalogEncode high-risk action scenarios with expected decisions.Policy regressions hide until incident-driven discovery.
Draft bundle simulationRun tests against unpublished policy bundle changes.Breaking changes ship before any realistic evaluation.
Snapshot pinningStore policy snapshot hash with every simulation result.Cannot prove which policy version produced a decision.
Publish gateBlock publish if required scenarios fail.Human review passes unsafe diffs under time pressure.
Fidelity split checksRun both live simulate and draft-bundle simulate for critical scenarios.Endpoint-specific behavior drift is discovered only after publish.

Cordum runtime capabilities

CapabilityCurrent behaviorWhy this matters
Policy simulate endpoint`POST /api/v1/policy/simulate` returns decision with no side effectsLets CI evaluate policy safely before dispatching actions.
Live simulate auth scopeGateway requires `admin` or `operator` role, then calls Safety Kernel `Simulate`.Prevents untrusted tenants from probing policy behavior.
Velocity dry-run behaviorFor `simulate`/`explain`, velocity path uses `CheckOnly` instead of recording request members.Simulation does not advance rate-limit buckets but still reflects current window pressure.
Draft bundle simulation`POST /api/v1/policy/bundles/{id}/simulate` requires `admin` and evaluates merged draft bundle in gateway.Tests unpublished bundle edits without mutating config state.
Draft simulation fidelity caveatBundle simulate uses `policybundles.EvaluatePolicyCheck`, which does not execute Safety-Kernel-only velocity/input-rule scanner paths.Critical scenarios with stateful or scanner-backed logic need live simulate parity checks.
Snapshot inspection`GET /api/v1/policy/snapshots` exposes active version/hashEnables deterministic traceability between decision and policy state.
Submit-time enforcementDeny/throttle/require_human applied before job persistence/dispatchSimulation mirrors real control points that gate autonomous actions.

Implementation examples

CI simulation gate (Bash)

policy-ci.sh
Bash
#!/usr/bin/env bash
set -euo pipefail

for scenario in scenarios/*.json; do
  expected_decision=$(jq -r '.expect.decision' "$scenario")
  expected_rule=$(jq -r '.expect.rule_id // ""' "$scenario")

  response=$(
    jq -c '.request' "$scenario" | curl -sS -X POST "$CORDUM_URL/api/v1/policy/simulate"       -H "Content-Type: application/json"       -d @-
  )

  got_decision=$(echo "$response" | jq -r '.decision')
  got_rule=$(echo "$response" | jq -r '.ruleId // ""')

  test "$got_decision" = "$expected_decision"
  if [ -n "$expected_rule" ]; then
    test "$got_rule" = "$expected_rule"
  fi
done

Scenario file (YAML)

scenario.yaml
YAML
{
  "name": "prod_delete_requires_approval",
  "request": {
    "topic": "job.infra.delete",
    "tenant": "default",
    "meta": {
      "capability": "infra.delete",
      "risk_tags": ["production"]
    }
  },
  "expect": {
    "decision": "DECISION_TYPE_REQUIRE_HUMAN",
    "rule_id": "require-approval-prod-delete"
  }
}

Simulation result record (JSON)

simulate-result.json
JSON
{
  "scenario_id": "prod_delete_requires_approval",
  "decision": "DECISION_TYPE_REQUIRE_HUMAN",
  "ruleId": "require-approval-prod-delete",
  "constraints": {
    "approver_group": "platform-oncall"
  },
  "policySnapshot": "sha256:8d0b..."
}

Live vs draft parity check

simulate-parity.sh
Bash
payload='{"topic":"job.infra.delete","tenant":"default","meta":{"capability":"infra.delete"}}'

live=$(curl -sS -X POST "$CORDUM_URL/api/v1/policy/simulate"   -H "Content-Type: application/json"   -d "$payload")

draft=$(curl -sS -X POST "$CORDUM_URL/api/v1/policy/bundles/secops/test/simulate"   -H "Content-Type: application/json"   -d "{"request":$payload}")

echo "$live"  | jq '{decision, ruleId, policySnapshot}'
echo "$draft" | jq '{decision, ruleId, policySnapshot}'
# Diff expected for velocity or scanner-backed rules

Limitations and tradeoffs

  • - Simulation cannot perfectly reproduce all production context values.
  • - Large scenario matrices increase confidence and CI runtime.
  • - Strict publish gates improve safety and can slow urgent policy hotfixes.
  • - Draft-bundle simulation is faster but lower fidelity for velocity and scanner-backed behavior.
  • - Scenario quality matters more than scenario count; shallow tests give false confidence.

Next step

Run this rollout in one sprint:

  1. 1. Define 25-50 high-risk policy scenarios from recent incidents and near misses.
  2. 2. Wire `policy/simulate` into CI and fail the pipeline on mismatched decisions.
  3. 3. Run parity checks between live simulate and bundle simulate for velocity/scanner-critical scenarios.
  4. 4. Require draft-bundle simulation before publish to production.
  5. 5. Store snapshot hash + result for every simulation run.

Continue with Pre-Dispatch Governance for AI Agents and AI Agent Idempotency Keys.

Test policy like code

If policy controls production actions, its test coverage should be treated as release-critical.