Skip to content
Deep Dive

Prompt Injection vs Out-of-Process Governance for AI Agents

Prompt filtering reduces probability. Governance controls reduce blast radius.

Deep Dive11 min readApr 2026
TL;DR
  • -Prompt injection detection is probabilistic. Action control must be deterministic.
  • -Cordum enforces policy out-of-process at submit-time and pre-dispatch; both gates can fail closed.
  • -Safety outages are explicit: `POLICY_CHECK_FAIL_MODE=closed` requeues/denies, `open` allows with bypass labels and metrics.
  • -If the model is tricked, constrained sinks and approval gates still block high-risk actions.
Failure mode

Model-only defenses can still approve malicious tool intent under realistic social-engineering prompts.

Current behavior

Cordum checks policy before publish and again before dispatch, with configurable fail mode and shared breaker state.

Operational payoff

Injection risk shifts from 'detect every string' to 'contain every dangerous sink'.

Scope

This is a control-plane execution guide. It focuses on action gating and fail behavior, not prompt-authoring tricks.

The production problem

Prompt injection is not just a text-classification bug.

It is an action-control bug.

If an agent can read untrusted content and call powerful tools, a single successful injection can exfiltrate data or trigger unsafe operations.

Detection helps, but detection is probabilistic. Production safety needs deterministic boundaries around dangerous sinks.

What top results cover and miss

SourceStrong coverageMissing piece
OWASP LLM01:2025 Prompt InjectionThreat taxonomy, direct/indirect injection patterns, and why perfect prevention is unclear.No concrete scheduler control-plane design for deterministic action gating under outage conditions.
OpenAI: Designing AI agents to resist prompt injection (Mar 11, 2026)Social-engineering framing and source-sink mitigation concepts.No implementation-level fail-mode contract for when policy infrastructure is unavailable.
Microsoft MSRC: Defending against indirect prompt injectionDefense-in-depth model and deterministic mitigations for specific impacts.No open control-plane runbook showing retry semantics and explicit bypass telemetry labels on jobs.

Out-of-process pattern

The pattern is simple.

Keep policy decisioning in a separate service boundary.

Force high-risk tool execution through explicit checks and approvals before worker dispatch.

If safety infrastructure degrades, behavior is explicit, measured, and configurable.

BoundaryCurrent behaviorWhy it matters
Submit-time check (gateway)Synchronous policy check before any state publish; 5s request timeout in gateway helper path.Blocks many risky tasks before they even enter dispatch flow.
Pre-dispatch check (scheduler)Second policy check with 3s scheduler timeout; safety unavailability routes via explicit input fail mode.Prevents stale approvals or context drift from slipping straight into worker execution.
Safety client timeoutgRPC safety client uses 2s timeout plus distributed circuit-breaker (`fail budget=3`, `open=30s`).Kernel latency cannot stall scheduler forever; failure behavior remains bounded and observable.
Fail-open taggingOn fail-open, scheduler sets `safety_bypassed=true` and `safety_bypass_reason` labels.Ops can query bypassed jobs and investigate immediately instead of guessing.
Default fail modeClosed by default for `POLICY_CHECK_FAIL_MODE` in gateway/scheduler paths.Safer default for production where silent bypass is unacceptable.

Concrete code paths

Submit-time gate in API gateway

core/controlplane/gateway/helpers.go
go
// core/controlplane/gateway/helpers.go (excerpt)
// submit-time policy check happens before publish
evalCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
resp, err := s.safetyClient.Evaluate(evalCtx, checkReq)
if err != nil {
  if isPolicyFailOpen() {
    return submitPolicyDecision{Allowed: true, Reason: "fail-open: safety unavailable"}
  }
  return submitPolicyDecision{Denied: true, Reason: "policy check unavailable"}
}

Pre-dispatch fail-mode behavior in scheduler

core/controlplane/scheduler/engine.go
go
// core/controlplane/scheduler/engine.go (excerpt)
case SafetyUnavailable:
  if e.isInputFailOpen() {
    record.Decision = SafetyAllow
    record.Reason = "fail-open: safety unavailable — " + record.Reason
    req.Labels["safety_bypassed"] = "true"
    req.Labels["safety_bypass_reason"] = record.Reason
  } else {
    return RetryAfter(fmt.Errorf("safety unavailable: %s", record.Reason), safetyThrottleDelay)
  }

Safety timing and breaker constants

core/controlplane/scheduler/engine.go + safety_client.go
go
// core/controlplane/scheduler/engine.go + safety_client.go (excerpt)
const (
  safetyThrottleDelay = 5 * time.Second
  safetyCheckTimeout  = 3 * time.Second
)

const (
  safetyTimeout            = 2 * time.Second
  safetyCircuitOpenFor     = 30 * time.Second
  safetyCircuitFailBudget  = 3
  safetyCircuitHalfOpenMax = 3
  safetyCircuitCloseAfter  = 2
)

Validation runbook

Validate fail-mode behavior directly in tests before policy rollout.

prompt-injection-governance-runbook.sh
bash
# 1) Verify scheduler fail-closed on safety unavailable
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailClosed -count=1

# 2) Verify scheduler fail-open path and bypass metric tagging
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen -count=1
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen_Metric -count=1

# 3) Verify gateway submit-time mode behavior
go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailClosed -count=1
go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailOpen -count=1

# 4) Runtime probe: force fail-open and submit a canary job
set POLICY_CHECK_FAIL_MODE=open
cordumctl job submit --topic job.default --prompt "policy bypass canary"

# 5) Confirm bypass labels are recorded
rg "safety_bypassed|input_fail_open" /var/log/cordum/scheduler.log

Limitations and tradeoffs

ApproachUpsideDownside
Model-only prompt filteringLow integration overhead.Bypass surface stays large; no hard stop on dangerous sinks.
Out-of-process policy checks (current)Deterministic controls on execution path, independent of model persuasion quality.Extra operational complexity: kernel availability, timeout tuning, breaker telemetry.
Fail-open everywhereHigh availability under safety outages.High-risk jobs can execute while guardrails are down; requires strict compensating controls.
  • - Out-of-process controls reduce risk but do not remove the need for model hardening.
  • - Fail-open needs explicit governance, not wishful thinking and dashboard optimism.
  • - Tool surface still needs strict scoping, approvals, and audit trails.

FAQ

Does out-of-process governance prevent every prompt injection attack?

No. It limits impact by controlling actions and data sinks even when the model is manipulated.

Why use both submit-time and pre-dispatch checks?

Submit-time blocks obvious risk early. Pre-dispatch catches drift and late-stage context changes before execution.

Should production run with POLICY_CHECK_FAIL_MODE=open?

Usually no. `open` is for controlled scenarios with strong downstream controls and clear bypass monitoring.

Next step

Run this in your staging cluster this week:

  1. 1. Keep `POLICY_CHECK_FAIL_MODE=closed` in production and document the exception path.
  2. 2. Add an alert on `cordum_scheduler_input_fail_open_total` greater than 0 over 5 minutes.
  3. 3. Add a dashboard panel for jobs with `safety_bypassed=true` labels.
  4. 4. Require human approval for any tool action that can transmit sensitive data externally.

Continue with MCP Security Risks and AI Agent Safety Kernel Outage Playbook.

Move prompt injection from model risk to systems risk

Treat prompt injection as an expected adversarial condition and constrain what the agent can do when it happens.