The production problem
Teams often ship policy changes with the same rigor as a Slack message. Then a deny turns into allow in production and nobody can explain why.
Autonomous agents increase that risk because policies gate real actions. A bad rule is not just a bad dashboard number. It can trigger wrong side effects.
What top results miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| AWS IAM policy simulator | Solid simulation workflow and clear caveats versus live environment behavior. | Not designed for multi-step autonomous agent action policies. |
| OPA policy testing | Strong unit testing model for policy rules with `opa test` and CI friendliness. | Does not prescribe runtime action governance patterns for agent execution systems. |
| OpenFGA testing models | Practical authorization model test format with check/list assertions. | Focuses on authorization semantics, not pre-dispatch risk controls for agents. |
Simulation model
| Stage | Required design | Failure if missing |
|---|---|---|
| Scenario catalog | Encode high-risk action scenarios with expected decisions. | Policy regressions hide until incident-driven discovery. |
| Draft bundle simulation | Run tests against unpublished policy bundle changes. | Breaking changes ship before any realistic evaluation. |
| Snapshot pinning | Store policy snapshot hash with every simulation result. | Cannot prove which policy version produced a decision. |
| Publish gate | Block publish if required scenarios fail. | Human review passes unsafe diffs under time pressure. |
Cordum runtime capabilities
| Capability | Current behavior | Why this matters |
|---|---|---|
| Policy simulate endpoint | `POST /api/v1/policy/simulate` returns decision with no side effects | Lets CI evaluate policy safely before dispatching actions. |
| Draft bundle simulation | `POST /api/v1/policy/bundles/{id}/simulate` tests unpublished policy edits | Prevents policy regressions from reaching production publish path. |
| Snapshot inspection | `GET /api/v1/policy/snapshots` exposes active version/hash | Enables deterministic traceability between decision and policy state. |
| Submit-time enforcement | Deny/throttle/require_human applied before job persistence/dispatch | Simulation mirrors real control points that gate autonomous actions. |
Implementation examples
CI simulation gate (Bash)
#!/usr/bin/env bash set -euo pipefail for scenario in scenarios/*.json; do curl -sS -X POST "$CORDUM_URL/api/v1/policy/simulate" -H "Content-Type: application/json" -d @"$scenario" | jq -e '.decision == .expected_decision' > /dev/null done
Scenario file (YAML)
id: prod_delete_requires_approval input: topic: infra.delete env: production actor: ops-agent expected_decision: require_human expected_rule_id: require-approval-prod-delete
Simulation result record (JSON)
{
"scenario_id": "prod_delete_requires_approval",
"decision": "require_human",
"rule_id": "require-approval-prod-delete",
"constraints": {
"approver_group": "platform-oncall"
},
"policy_snapshot": "sha256:8d0b..."
}Limitations and tradeoffs
- - Simulation cannot perfectly reproduce all production context values.
- - Large scenario matrices increase confidence and CI runtime.
- - Strict publish gates improve safety and can slow urgent policy hotfixes.
- - Scenario quality matters more than scenario count; shallow tests give false confidence.
Next step
Run this rollout in one sprint:
- 1. Define 25-50 high-risk policy scenarios from recent incidents and near misses.
- 2. Wire `policy/simulate` into CI and fail the pipeline on mismatched decisions.
- 3. Require draft-bundle simulation before publish to production.
- 4. Store snapshot hash + result for every simulation run.
Continue with Pre-Dispatch Governance for AI Agents and AI Agent Idempotency Keys.