The production problem
Teams often ship policy changes with the same rigor as a Slack message. Then a deny turns into allow in production and nobody can explain why.
Autonomous agents increase that risk because policies gate real actions. A bad rule is not just a bad dashboard number. It can trigger wrong side effects.
Another trap is simulation parity. Teams assume one simulate endpoint represents all runtime behavior. In practice, endpoint choice changes what code paths are exercised.
What top results miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| AWS IAM policy simulator | Solid simulation workflow and clear caveats versus live environment behavior. | Not designed for multi-step autonomous agent action policies. |
| OPA policy testing | Strong unit testing model for policy rules with `opa test` and CI friendliness. | Does not cover runtime parity when policy checks include stateful controls like velocity windows. |
| OpenFGA testing models | Practical authorization model test format with check/list assertions. | Focuses on authorization semantics, not pre-dispatch risk controls for agents. |
Simulation model
| Stage | Required design | Failure if missing |
|---|---|---|
| Scenario catalog | Encode high-risk action scenarios with expected decisions. | Policy regressions hide until incident-driven discovery. |
| Draft bundle simulation | Run tests against unpublished policy bundle changes. | Breaking changes ship before any realistic evaluation. |
| Snapshot pinning | Store policy snapshot hash with every simulation result. | Cannot prove which policy version produced a decision. |
| Publish gate | Block publish if required scenarios fail. | Human review passes unsafe diffs under time pressure. |
| Fidelity split checks | Run both live simulate and draft-bundle simulate for critical scenarios. | Endpoint-specific behavior drift is discovered only after publish. |
Cordum runtime capabilities
| Capability | Current behavior | Why this matters |
|---|---|---|
| Policy simulate endpoint | `POST /api/v1/policy/simulate` returns decision with no side effects | Lets CI evaluate policy safely before dispatching actions. |
| Live simulate auth scope | Gateway requires `admin` or `operator` role, then calls Safety Kernel `Simulate`. | Prevents untrusted tenants from probing policy behavior. |
| Velocity dry-run behavior | For `simulate`/`explain`, velocity path uses `CheckOnly` instead of recording request members. | Simulation does not advance rate-limit buckets but still reflects current window pressure. |
| Draft bundle simulation | `POST /api/v1/policy/bundles/{id}/simulate` requires `admin` and evaluates merged draft bundle in gateway. | Tests unpublished bundle edits without mutating config state. |
| Draft simulation fidelity caveat | Bundle simulate uses `policybundles.EvaluatePolicyCheck`, which does not execute Safety-Kernel-only velocity/input-rule scanner paths. | Critical scenarios with stateful or scanner-backed logic need live simulate parity checks. |
| Snapshot inspection | `GET /api/v1/policy/snapshots` exposes active version/hash | Enables deterministic traceability between decision and policy state. |
| Submit-time enforcement | Deny/throttle/require_human applied before job persistence/dispatch | Simulation mirrors real control points that gate autonomous actions. |
Implementation examples
CI simulation gate (Bash)
#!/usr/bin/env bash
set -euo pipefail
for scenario in scenarios/*.json; do
expected_decision=$(jq -r '.expect.decision' "$scenario")
expected_rule=$(jq -r '.expect.rule_id // ""' "$scenario")
response=$(
jq -c '.request' "$scenario" | curl -sS -X POST "$CORDUM_URL/api/v1/policy/simulate" -H "Content-Type: application/json" -d @-
)
got_decision=$(echo "$response" | jq -r '.decision')
got_rule=$(echo "$response" | jq -r '.ruleId // ""')
test "$got_decision" = "$expected_decision"
if [ -n "$expected_rule" ]; then
test "$got_rule" = "$expected_rule"
fi
doneScenario file (YAML)
{
"name": "prod_delete_requires_approval",
"request": {
"topic": "job.infra.delete",
"tenant": "default",
"meta": {
"capability": "infra.delete",
"risk_tags": ["production"]
}
},
"expect": {
"decision": "DECISION_TYPE_REQUIRE_HUMAN",
"rule_id": "require-approval-prod-delete"
}
}Simulation result record (JSON)
{
"scenario_id": "prod_delete_requires_approval",
"decision": "DECISION_TYPE_REQUIRE_HUMAN",
"ruleId": "require-approval-prod-delete",
"constraints": {
"approver_group": "platform-oncall"
},
"policySnapshot": "sha256:8d0b..."
}Live vs draft parity check
payload='{"topic":"job.infra.delete","tenant":"default","meta":{"capability":"infra.delete"}}'
live=$(curl -sS -X POST "$CORDUM_URL/api/v1/policy/simulate" -H "Content-Type: application/json" -d "$payload")
draft=$(curl -sS -X POST "$CORDUM_URL/api/v1/policy/bundles/secops/test/simulate" -H "Content-Type: application/json" -d "{"request":$payload}")
echo "$live" | jq '{decision, ruleId, policySnapshot}'
echo "$draft" | jq '{decision, ruleId, policySnapshot}'
# Diff expected for velocity or scanner-backed rulesLimitations and tradeoffs
- - Simulation cannot perfectly reproduce all production context values.
- - Large scenario matrices increase confidence and CI runtime.
- - Strict publish gates improve safety and can slow urgent policy hotfixes.
- - Draft-bundle simulation is faster but lower fidelity for velocity and scanner-backed behavior.
- - Scenario quality matters more than scenario count; shallow tests give false confidence.
Next step
Run this rollout in one sprint:
- 1. Define 25-50 high-risk policy scenarios from recent incidents and near misses.
- 2. Wire `policy/simulate` into CI and fail the pipeline on mismatched decisions.
- 3. Run parity checks between live simulate and bundle simulate for velocity/scanner-critical scenarios.
- 4. Require draft-bundle simulation before publish to production.
- 5. Store snapshot hash + result for every simulation run.
Continue with Pre-Dispatch Governance for AI Agents and AI Agent Idempotency Keys.