Skip to content
Deep Dive

Building Custom Safety Policies for AI Agents

Policy language is easy. Deterministic enforcement under failure is the hard part.

Deep Dive12 min readApr 2026
TL;DR
  • -Custom policy quality is determined by determinism, not by policy syntax alone.
  • -Cordum policy flow is explicit: evaluate -> snapshot -> enforce -> audit, with fail-closed defaults.
  • -Production policy sources must be signed and transport-constrained (`https`, allowlist, private-host checks).
  • -Policy simulation endpoints are mandatory before rollout; policy edits without simulation are outage fuel.
Failure mode

Most teams write policy rules but skip enforcement semantics, signatures, and rollback plans.

Current behavior

Cordum maps policy decisions to strict runtime enums and stores policy snapshots on each decision path.

Operational payoff

Deterministic policy enforcement cuts incident ambiguity during approvals, replays, and postmortems.

Scope

This guide focuses on custom input safety policy controls in Cordum. It does not attempt to replace model safety tuning.

The production problem

Most policy projects fail after the first rules file lands in Git.

The syntax works. The runtime path does not.

Teams discover late that policy updates are unsigned, approval decisions use stale snapshots, and fail-mode defaults are unclear during outages.

That is how a policy platform becomes a logging platform.

What top results cover and miss

SourceStrong coverageMissing piece
Open Policy Agent docsPolicy-as-code foundations and decoupling authorization from app logic.No agent-control-plane walkthrough for policy snapshots, approval drift checks, and sink-specific AI constraints.
Cedar policy language referenceAuthorization model, schema validation, and policy readability at scale.No execution-path guidance for queue-driven AI jobs where decisions must survive retries and delayed approvals.
AWS AgentCore policy blogPolicy authoring options and real-time interception ideas for agent tooling.No explicit fail-mode contract and no open code path for policy-signature enforcement and approval snapshot conflicts.

Policy model that survives production

A practical custom policy system needs five properties:

Deterministic decisions. Snapshot lineage. Source integrity. Rollout simulation. Drift-safe approvals.

BoundaryCurrent behaviorWhy it matters
Decision contract`allow`, `deny`, `require_approval`, `throttle`, `allow_with_constraints` map to strict protobuf enums.No ambiguous runtime behavior when policy authors add new rules.
Snapshot lineageEach evaluation returns `PolicySnapshot`; approval path checks snapshot consistency before final approval.Prevents approval on stale policy after security changes.
Source integrityPolicy can load from file or URL, but signature verification is required in production.Blocks silent policy tampering and accidental unsigned rollout.
Transport hardeningProduction rejects `http://` policy URLs and blocks private hosts unless explicitly enabled.Reduces SSRF and policy-supply-chain attack surface.
Pre-rollout simulationGateway exposes `/api/v1/policy/evaluate|simulate|explain` endpoints for dry-run analysis.Lets teams catch blast-radius errors before policy publish.

Concrete code paths

Custom policy skeleton

config/safety.yaml
yaml
# config/safety.yaml (excerpt)
default_decision: deny

rules:
  - id: workflow-approval-gate
    match:
      topics:
        - job.cordum.approval-gate
    decision: require_approval
    reason: "Workflow approval gates require explicit human authorization."

input_policy:
  fail_mode: closed

output_policy:
  enabled: true
  fail_mode: closed

Decision mapping and response shape

core/controlplane/safetykernel/kernel.go
go
// core/controlplane/safetykernel/kernel.go (excerpt)
switch policyDecision.Decision {
case "deny":
  decision = pb.DecisionType_DECISION_TYPE_DENY
case "require_approval":
  decision = pb.DecisionType_DECISION_TYPE_REQUIRE_HUMAN
case "throttle":
  decision = pb.DecisionType_DECISION_TYPE_THROTTLE
case "allow_with_constraints":
  decision = pb.DecisionType_DECISION_TYPE_ALLOW_WITH_CONSTRAINTS
case "allow":
  decision = pb.DecisionType_DECISION_TYPE_ALLOW
}

resp := &pb.PolicyCheckResponse{
  Decision: decision,
  PolicySnapshot: snapshot,
  RuleId: ruleID,
  ApprovalRequired: approvalRequired,
}

Policy source and snapshot build

core/controlplane/safetykernel/kernel.go
go
// core/controlplane/safetykernel/kernel.go (excerpt)
func policySourceFromEnv(path string) string {
  if raw := strings.TrimSpace(os.Getenv("SAFETY_POLICY_URL")); raw != "" {
    return raw
  }
  return strings.TrimSpace(path)
}

func loadPolicyBundle(source string) (*config.SafetyPolicy, string, error) {
  data, err := readPolicySource(source)
  if err != nil { return nil, "", err }
  if err := verifyPolicySignature(data, source); err != nil { return nil, "", err }
  policy, err := config.ParseSafetyPolicy(data)
  // snapshot = "<version>:<sha256>"
}

Signature enforcement (Ed25519)

core/controlplane/safetykernel/kernel.go
go
// core/controlplane/safetykernel/kernel.go (excerpt)
func verifyPolicySignature(data []byte, source string) error {
  requireSignature := env.IsProduction() || env.Bool("SAFETY_POLICY_SIGNATURE_REQUIRED")
  if pubRaw == "" && requireSignature {
    return errors.New("policy signature required but SAFETY_POLICY_PUBLIC_KEY not configured")
  }
  if !ed25519.Verify(ed25519.PublicKey(pubKey), data, sig) {
    return errors.New("policy signature verification failed")
  }
  return nil
}

Approval drift guard

core/controlplane/gateway/handlers_approvals.go
go
// core/controlplane/gateway/handlers_approvals.go (excerpt)
if currentSnapshot == "" || snapshotBase(currentSnapshot) != snapshotBase(policySnapshot) {
  result = handlerResult{http.StatusConflict, "policy snapshot changed; re-evaluate before approving"}
  return nil
}

Validation runbook

Run this checklist before every policy publish:

custom-safety-policy-runbook.sh
bash
# 1) Validate policy source and signature enforcement
go test ./core/controlplane/safetykernel -run TestVerifyPolicySignature -count=1
go test ./core/controlplane/safetykernel -run TestVerifyPolicySignatureProductionRequiresKey -count=1
go test ./core/controlplane/safetykernel -run TestFetchPolicyURLRejectsHTTPInProduction -count=1

# 2) Validate policy decision APIs
curl -sS -X POST http://localhost:8081/api/v1/policy/evaluate \
  -H "Content-Type: application/json" \
  -d '{"topic":"job.default","tenant":"default","principal_id":"ops-admin"}'

curl -sS -X POST http://localhost:8081/api/v1/policy/simulate \
  -H "Content-Type: application/json" \
  -d '{"topic":"job.default","tenant":"default","principal_id":"ops-admin"}'

# 3) Validate approval snapshot drift protection
go test ./core/controlplane/gateway -run TestApprovalsRequireCurrentPolicySnapshot -count=1

# 4) Validate scheduler fail mode remains closed by default
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailClosed -count=1

Limitations and tradeoffs

ApproachUpsideDownside
Loose policy rules + manual reviewFast initial setup.High human load, inconsistent enforcement, poor incident reproducibility.
Deterministic policy + snapshot enforcement (current)Consistent behavior across retries, approvals, and distributed replicas.Requires disciplined policy testing and version management.
Unsigned remote policy fetchOperational convenience for quick edits.Unsafe supply chain surface; production drift and tamper risk.
  • - First-match rules in YAML are powerful and dangerous. Rule ordering errors create hidden policy regressions.
  • - Strict signatures improve integrity but force stronger key-management discipline.
  • - Simulation endpoints reduce risk, but only if your test payloads reflect real tenant traffic.

FAQ

What makes AI policy enforcement deterministic?

A fixed decision contract, explicit fail-mode behavior, snapshoted policy lineage, and replay-safe state transitions.

Why are policy snapshots critical for approvals?

Approvals may happen minutes later. Snapshot checks prevent approving under a different policy than the one originally evaluated.

Do I need signature verification in non-production?

You can relax it in dev, but production should always enforce signatures to prevent policy tampering.

Next step

Do this in your next sprint:

  1. 1. Add signature enforcement to staging first, then production.
  2. 2. Require policy simulation evidence in code review before any policy merge.
  3. 3. Add an approval-SLA alert for snapshot mismatch conflicts.
  4. 4. Keep `default_decision: deny` and `input_policy.fail_mode: closed` unless risk owners approve exceptions.

Continue with AI Agent Policy Simulation and Prompt Injection vs Out-of-Process Governance.

Policy quality is an operational discipline

Start with strict defaults, prove behavior with simulation, and publish only signed snapshots.