Prompt Injection vs Out-of-Process Governance for AI Agents (2026)

The production problem

Prompt injection is not just a text-classification bug.

It is an action-control bug.

If an agent can read untrusted content and call powerful tools, a single successful injection can exfiltrate data or trigger unsafe operations.

Detection helps, but detection is probabilistic. Production safety needs deterministic boundaries around dangerous sinks.

What top results cover and miss

Source	Strong coverage	Missing piece
OWASP LLM01:2025 Prompt Injection	Threat taxonomy, direct/indirect injection patterns, and why perfect prevention is unclear.	No concrete scheduler control-plane design for deterministic action gating under outage conditions.
OpenAI: Designing AI agents to resist prompt injection (Mar 11, 2026)	Social-engineering framing and source-sink mitigation concepts.	No implementation-level fail-mode contract for when policy infrastructure is unavailable.
Microsoft MSRC: Defending against indirect prompt injection	Defense-in-depth model and deterministic mitigations for specific impacts.	No open control-plane runbook showing retry semantics and explicit bypass telemetry labels on jobs.

Out-of-process pattern

The pattern is simple.

Keep policy decisioning in a separate service boundary.

Force high-risk tool execution through explicit checks and approvals before worker dispatch.

If safety infrastructure degrades, behavior is explicit, measured, and configurable.

Boundary	Current behavior	Why it matters
Submit-time check (gateway)	Synchronous policy check before any state publish; 5s request timeout in gateway helper path.	Blocks many risky tasks before they even enter dispatch flow.
Pre-dispatch check (scheduler)	Second policy check with 3s scheduler timeout; safety unavailability routes via explicit input fail mode.	Prevents stale approvals or context drift from slipping straight into worker execution.
Safety client timeout	gRPC safety client uses 2s timeout plus distributed circuit-breaker (`fail budget=3`, `open=30s`).	Kernel latency cannot stall scheduler forever; failure behavior remains bounded and observable.
Fail-open tagging	On fail-open, scheduler sets `safety_bypassed=true` and `safety_bypass_reason` labels.	Ops can query bypassed jobs and investigate immediately instead of guessing.
Default fail mode	Closed by default for `POLICY_CHECK_FAIL_MODE` in gateway/scheduler paths.	Safer default for production where silent bypass is unacceptable.

Concrete code paths

Submit-time gate in API gateway

core/controlplane/gateway/helpers.go

// core/controlplane/gateway/helpers.go (excerpt)
// submit-time policy check happens before publish
evalCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
resp, err := s.safetyClient.Evaluate(evalCtx, checkReq)
if err != nil {
  if isPolicyFailOpen() {
    return submitPolicyDecision{Allowed: true, Reason: "fail-open: safety unavailable"}
  }
  return submitPolicyDecision{Denied: true, Reason: "policy check unavailable"}
}

Pre-dispatch fail-mode behavior in scheduler

core/controlplane/scheduler/engine.go

// core/controlplane/scheduler/engine.go (excerpt)
case SafetyUnavailable:
  if e.isInputFailOpen() {
    record.Decision = SafetyAllow
    record.Reason = "fail-open: safety unavailable — " + record.Reason
    req.Labels["safety_bypassed"] = "true"
    req.Labels["safety_bypass_reason"] = record.Reason
  } else {
    return RetryAfter(fmt.Errorf("safety unavailable: %s", record.Reason), safetyThrottleDelay)
  }

Safety timing and breaker constants

core/controlplane/scheduler/engine.go + safety_client.go

// core/controlplane/scheduler/engine.go + safety_client.go (excerpt)
const (
  safetyThrottleDelay = 5 * time.Second
  safetyCheckTimeout  = 3 * time.Second
)

const (
  safetyTimeout            = 2 * time.Second
  safetyCircuitOpenFor     = 30 * time.Second
  safetyCircuitFailBudget  = 3
  safetyCircuitHalfOpenMax = 3
  safetyCircuitCloseAfter  = 2
)

Validation runbook

Validate fail-mode behavior directly in tests before policy rollout.

prompt-injection-governance-runbook.sh

bash

# 1) Verify scheduler fail-closed on safety unavailable
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailClosed -count=1

# 2) Verify scheduler fail-open path and bypass metric tagging
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen -count=1
go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen_Metric -count=1

# 3) Verify gateway submit-time mode behavior
go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailClosed -count=1
go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailOpen -count=1

# 4) Runtime probe: force fail-open and submit a canary job
set POLICY_CHECK_FAIL_MODE=open
cordumctl job submit --topic job.default --prompt "policy bypass canary"

# 5) Confirm bypass labels are recorded
rg "safety_bypassed|input_fail_open" /var/log/cordum/scheduler.log

Limitations and tradeoffs

Approach	Upside	Downside
Model-only prompt filtering	Low integration overhead.	Bypass surface stays large; no hard stop on dangerous sinks.
Out-of-process policy checks (current)	Deterministic controls on execution path, independent of model persuasion quality.	Extra operational complexity: kernel availability, timeout tuning, breaker telemetry.
Fail-open everywhere	High availability under safety outages.	High-risk jobs can execute while guardrails are down; requires strict compensating controls.

- Out-of-process controls reduce risk but do not remove the need for model hardening.
- Fail-open needs explicit governance, not wishful thinking and dashboard optimism.
- Tool surface still needs strict scoping, approvals, and audit trails.

FAQ

Does out-of-process governance prevent every prompt injection attack?

No. It limits impact by controlling actions and data sinks even when the model is manipulated.

Why use both submit-time and pre-dispatch checks?

Submit-time blocks obvious risk early. Pre-dispatch catches drift and late-stage context changes before execution.

Should production run with POLICY_CHECK_FAIL_MODE=open?

Usually no. `open` is for controlled scenarios with strong downstream controls and clear bypass monitoring.

Next step

Run this in your staging cluster this week:

1. Keep `POLICY_CHECK_FAIL_MODE=closed` in production and document the exception path.
2. Add an alert on `cordum_scheduler_input_fail_open_total` greater than 0 over 5 minutes.
3. Add a dashboard panel for jobs with `safety_bypassed=true` labels.
4. Require human approval for any tool action that can transmit sensitive data externally.

Continue with MCP Security Risks and AI Agent Safety Kernel Outage Playbook.