The production problem
Most organizations have AI policies. Fewer have AI policies that can block a risky action before a worker executes it.
That distinction matters. Under delivery pressure, manual review steps get skipped, scripts get reused, and “temporary exceptions” become default behavior.
What top ranking sources cover vs miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| Kyndryl policy-as-code article | Clear business risk framing and strong argument for machine-readable guardrails in regulated environments. | No concrete rule schema or deterministic dispatch integration details. |
| iMerit policy-as-code workflow article | Practical workflow insertion points and good examples of rule-to-workflow mapping. | Limited treatment of pre-dispatch runtime decision contracts for autonomous agents. |
| Upsun scalable AI governance article | Strong deployment-pipeline perspective with enforceable templates and platform consistency. | No policy outcome taxonomy tied to approval queues and execution state transitions. |
Policy model that scales
A workable model has four layers: match, decision, constraints, and evidence. Each layer must be explicit.
| Layer | Purpose | Common anti-pattern |
|---|---|---|
| Rule match | Select policy branch using topic, risk tags, actor context, labels | Broad wildcard rules with hidden overrides |
| Decision | Return ALLOW, DENY, REQUIRE_APPROVAL, THROTTLE, or ALLOW_WITH_CONSTRAINTS | Soft recommendations that do not affect dispatch |
| Constraints | Bound runtime and blast radius even for allowed jobs | Allow decisions with no bounding controls |
| Evidence | Persist matched rule, reason, policy snapshot, and actor | No traceability from decision to run outcome |
version: v1
rules:
- id: allow-read
match:
topics: ["job.mcp-bridge.read.*"]
risk_tags: []
decision: allow
- id: require-approval-prod-write
match:
topics: ["job.mcp-bridge.write.*"]
risk_tags: ["prod", "write"]
decision: require_approval
reason: "Prod writes require human approval"
- id: constrain-medium-risk
match:
topics: ["job.agent.exec.*"]
risk_tags: ["medium"]
decision: allow_with_constraints
constraints:
max_runtime_sec: 60
max_retries: 1
network_allowlist: ["api.github.com", "api.slack.com"]
- id: deny-destructive
match:
risk_tags: ["destructive"]
decision: denySimulation and rollout
Simulation should run on representative production-like jobs before policy publish. If a rule change flips too many decisions, stop and adjust before rollout.
POST /api/v1/policy/simulate
{
"tenant_id": "default",
"job": {
"topic": "job.mcp-bridge.write.update_ticket",
"risk_tags": ["prod", "write"],
"labels": {
"mcp.server": "jira",
"mcp.action": "write"
}
}
}
200 OK
{
"decision": "REQUIRE_APPROVAL",
"reason": "Prod writes require human approval",
"constraints": {
"max_runtime_sec": 60,
"max_retries": 1
},
"matched_rule": "require-approval-prod-write"
}# 1) Publish policy snapshot
curl -sS -X POST http://localhost:8081/api/v1/policy/publish -H "Content-Type: application/json" -d '{"snapshot":"v42","note":"tighten prod write controls"}'
# 2) Verify queue impact and decision mix
curl -sS "http://localhost:8081/api/v1/approvals?include_resolved=false"
# 3) Roll back if regression detected
curl -sS -X POST http://localhost:8081/api/v1/policy/rollback -H "Content-Type: application/json" -d '{"target_snapshot":"v41","note":"rollback due false positive spike"}'Operational defaults and guardrails
Stable policy operations depend on sane defaults. These values come from current Cordum references and should be validated against your environment:
| Guardrail | Default | Why it exists |
|---|---|---|
| Safety unavailable fail mode | closed | Prevent unchecked execution when policy dependency is down |
| Safety request timeout | 2s | Bound hot-path latency and avoid scheduler lockups |
| Policy reload interval | 30s | Balance update speed against config churn |
| Decision cache max size | 10000 | Keep repeated checks fast while controlling memory usage |
Limitations and tradeoffs
Policy catalogs require ownership and regular pruning as workflows and regulations evolve.
Overly strict rules can slow delivery if simulation and tuning loops are weak.
Policy enforcement quality depends on reliable control-plane and telemetry infrastructure.