Skip to content
LLM Safety Kernel

LLM Safety Kernel for AI Agents

How to enforce deterministic policy outcomes before autonomous actions hit production systems.

Safety Kernel13 min readUpdated Apr 2026
TL;DR
  • -If policy decisions are not deterministic, approvals are hard to trust.
  • -A safety kernel needs to run before dispatch, not after an action lands.
  • -Constraints are as important as allow or deny because they bound blast radius.
  • -Input policy and output safety are different gates and both matter.
Failure Mode

Tool calls that execute before policy checks turn governance into incident response.

Simulation Gate

Policy simulation in CI catches risky rule changes before they impact real runs.

Snapshot Lineage

Approval records tied to policy snapshots remove guesswork during audits.

Scope

This page focuses on runtime policy enforcement for autonomous AI agents in production. The target is predictable execution behavior, not policy language theory.

The production problem

Many teams call their system safe because they run one classifier before generation. That misses the main risk window: tool execution.

By the time a write action hits a repo, ticketing system, or cloud API, advisory guardrails are too late. You need a policy decision point that can hard stop execution, require approval, or apply constraints.

What top ranking sources cover vs miss

SourceStrong coverageMissing piece
OpenAI practical guide to building agentsStrong framing for layered guardrails, tool risk ratings, and mixing classifiers with deterministic checks.No concrete contract for policy outcomes tied to scheduler state transitions and approval binding.
NVIDIA NeMo Guardrails architecture guideDetailed event-driven runtime and multi-stage guardrail flow with canonical intent and next-step generation.Does not focus on pre-dispatch policy gating for external worker jobs and approval queue mechanics.
AWS ApplyGuardrail API guideClear pre and post model checking pattern with independent API and explicit INPUT versus OUTPUT sources.Limited guidance on deterministic governance for tool execution, run timelines, and policy snapshot lineage.

Decision contract

A safety kernel contract should be small and strict. If decisions are fuzzy, operators will invent manual exceptions and your queue policy turns into folklore.

DecisionEffectScheduler behaviorEvidence
ALLOWJob can proceedDispatch normallyRule id, reason, snapshot are recorded
DENYJob is blockedReject before worker executionDecision record shows deny rule and reason
REQUIRE_APPROVALHuman gate requiredState becomes APPROVAL_REQUIRED and waitsApproval stores policy snapshot and decision summary
THROTTLERate pressure signalSubmit path returns ResourceExhaustedDecision audit includes throttle reason
ALLOW_WITH_CONSTRAINTSAllowed with strict boundsDispatch with runtime limitsConstraints persisted with policy decision
safety-policy.yaml
YAML
version: v1
rules:
  - id: read-only-allow
    match:
      topics: ["job.mcp-bridge.read.*"]
    decision: allow

  - id: prod-write-needs-approval
    match:
      topics: ["job.mcp-bridge.write.*"]
      risk_tags: ["prod", "write"]
    decision: require_approval
    reason: "Production writes must be approved"

  - id: medium-risk-bounded
    match:
      topics: ["job.agent.exec.*"]
      risk_tags: ["medium"]
    decision: allow_with_constraints
    constraints:
      max_runtime_sec: 60
      max_retries: 1
      max_artifact_bytes: 1048576

  - id: destructive-deny
    match:
      risk_tags: ["destructive"]
    decision: deny

Runtime implementation

Cordum evaluates policy at submit time in the gateway and again at dispatch time in the scheduler. That double gate closes race windows between intake and worker execution.

Approval-required requests pause in approval state, and approvals are bound to policy snapshot plus job hash before requeueing. That is the detail auditors ask for when incidents happen at 2 AM.

simulate-policy.http
JSON
POST /api/v1/policy/simulate
{
  "job_id": "job-sim-001",
  "tenant_id": "default",
  "topic": "job.mcp-bridge.write.update_issue",
  "labels": {
    "mcp.server": "jira",
    "mcp.action": "write"
  },
  "meta": {
    "capability": "ticket.update",
    "risk_tags": ["prod", "write"]
  }
}

200 OK
{
  "decision": "REQUIRE_APPROVAL",
  "policy_rule_id": "prod-write-needs-approval",
  "policy_reason": "Production writes must be approved",
  "policy_snapshot": "cfg:system:policy#sha256:7f3d...9c2b",
  "approval_required": true,
  "constraints": {}
}
decision-audit.sh
Bash
# Decision history for a specific job
curl -sS http://localhost:8081/api/v1/jobs/job-sim-001/decisions

# Pending approvals with decision summary context
curl -sS "http://localhost:8081/api/v1/approvals?include_resolved=false"

# Policy change audit trail
curl -sS http://localhost:8081/api/v1/policy/audit
GuardrailDefaultWhy it exists
Gateway submit-time policyenabledRejects risky work before state persist and bus publish
Scheduler dispatch-time policyenabledBlocks stale or bypassed requests from reaching workers
Safety client timeout2sKeeps scheduler hot path responsive under policy service pressure
Policy reload interval30sApplies rule updates without process restart
Decision cache max size10000Reduces repeated check latency for common requests
Fail modeclosedPrevents unchecked execution if safety dependency is unavailable

Output safety

Input policy does not guarantee safe outputs. Generated content can still contain secrets, unsafe payloads, or policy violations. That is why output safety is a separate gate.

DecisionMeaningState impact
ALLOWRelease outputJob remains succeeded
REDACTRelease sanitized outputSucceeded with preferred redacted pointer
QUARANTINEHold output for reviewMoves to OUTPUT_QUARANTINED and emits DLQ event

Current output checks are fail-open in the scheduler hot path when the checker is unavailable. That protects availability, but it raises risk. Teams handling sensitive data should monitor skipped checks closely.

Limitations and tradeoffs

Policy maintenance cost

Rules, constraints, and exceptions need explicit ownership or they decay quickly.

False-positive pressure

Strict defaults can slow delivery unless simulation and rollback are part of normal release flow.

Availability tradeoffs

Fail-closed improves control but can increase refusal rates during safety dependency incidents.

Frequently Asked Questions

Can prompt-only guardrails replace a safety kernel?
No. Prompt guidance helps, but it cannot guarantee deterministic runtime enforcement for tool execution and worker dispatch.
Why evaluate policy both at submit and dispatch?
Because state and policy can change between those points. Double checks reduce race windows and stale decision risk.
Do constraints matter if a request is approved?
Yes. Approval answers if work is allowed. Constraints answer how much work is allowed.
Is output safety optional if input policy is strong?
No. Safe inputs can still produce unsafe outputs, especially in long tool chains or code generation tasks.
Next step

Pick one high-risk write workflow and move it behind deterministic policy decisions this week. Start with simulation, then publish behind a rollback drill you can run in under 10 minutes.

Sources