The production problem
Prompt injection is not just a text-classification bug.
It is an action-control bug.
If an agent can read untrusted content and call powerful tools, a single successful injection can exfiltrate data or trigger unsafe operations.
Detection helps, but detection is probabilistic. Production safety needs deterministic boundaries around dangerous sinks.
What top results cover and miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| OWASP LLM01:2025 Prompt Injection | Threat taxonomy, direct/indirect injection patterns, and why perfect prevention is unclear. | No concrete scheduler control-plane design for deterministic action gating under outage conditions. |
| OpenAI: Designing AI agents to resist prompt injection (Mar 11, 2026) | Social-engineering framing and source-sink mitigation concepts. | No implementation-level fail-mode contract for when policy infrastructure is unavailable. |
| Microsoft MSRC: Defending against indirect prompt injection | Defense-in-depth model and deterministic mitigations for specific impacts. | No open control-plane runbook showing retry semantics and explicit bypass telemetry labels on jobs. |
Out-of-process pattern
The pattern is simple.
Keep policy decisioning in a separate service boundary.
Force high-risk tool execution through explicit checks and approvals before worker dispatch.
If safety infrastructure degrades, behavior is explicit, measured, and configurable.
| Boundary | Current behavior | Why it matters |
|---|---|---|
| Submit-time check (gateway) | Synchronous policy check before any state publish; 5s request timeout in gateway helper path. | Blocks many risky tasks before they even enter dispatch flow. |
| Pre-dispatch check (scheduler) | Second policy check with 3s scheduler timeout; safety unavailability routes via explicit input fail mode. | Prevents stale approvals or context drift from slipping straight into worker execution. |
| Safety client timeout | gRPC safety client uses 2s timeout plus distributed circuit-breaker (`fail budget=3`, `open=30s`). | Kernel latency cannot stall scheduler forever; failure behavior remains bounded and observable. |
| Fail-open tagging | On fail-open, scheduler sets `safety_bypassed=true` and `safety_bypass_reason` labels. | Ops can query bypassed jobs and investigate immediately instead of guessing. |
| Default fail mode | Closed by default for `POLICY_CHECK_FAIL_MODE` in gateway/scheduler paths. | Safer default for production where silent bypass is unacceptable. |
Concrete code paths
Submit-time gate in API gateway
// core/controlplane/gateway/helpers.go (excerpt)
// submit-time policy check happens before publish
evalCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
resp, err := s.safetyClient.Evaluate(evalCtx, checkReq)
if err != nil {
if isPolicyFailOpen() {
return submitPolicyDecision{Allowed: true, Reason: "fail-open: safety unavailable"}
}
return submitPolicyDecision{Denied: true, Reason: "policy check unavailable"}
}Pre-dispatch fail-mode behavior in scheduler
// core/controlplane/scheduler/engine.go (excerpt)
case SafetyUnavailable:
if e.isInputFailOpen() {
record.Decision = SafetyAllow
record.Reason = "fail-open: safety unavailable — " + record.Reason
req.Labels["safety_bypassed"] = "true"
req.Labels["safety_bypass_reason"] = record.Reason
} else {
return RetryAfter(fmt.Errorf("safety unavailable: %s", record.Reason), safetyThrottleDelay)
}Safety timing and breaker constants
// core/controlplane/scheduler/engine.go + safety_client.go (excerpt) const ( safetyThrottleDelay = 5 * time.Second safetyCheckTimeout = 3 * time.Second ) const ( safetyTimeout = 2 * time.Second safetyCircuitOpenFor = 30 * time.Second safetyCircuitFailBudget = 3 safetyCircuitHalfOpenMax = 3 safetyCircuitCloseAfter = 2 )
Validation runbook
Validate fail-mode behavior directly in tests before policy rollout.
# 1) Verify scheduler fail-closed on safety unavailable go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailClosed -count=1 # 2) Verify scheduler fail-open path and bypass metric tagging go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen -count=1 go test ./core/controlplane/scheduler -run TestSafetyUnavailable_FailOpen_Metric -count=1 # 3) Verify gateway submit-time mode behavior go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailClosed -count=1 go test ./core/controlplane/gateway -run TestSubmitJobGRPC_PolicyFailOpen -count=1 # 4) Runtime probe: force fail-open and submit a canary job set POLICY_CHECK_FAIL_MODE=open cordumctl job submit --topic job.default --prompt "policy bypass canary" # 5) Confirm bypass labels are recorded rg "safety_bypassed|input_fail_open" /var/log/cordum/scheduler.log
Limitations and tradeoffs
| Approach | Upside | Downside |
|---|---|---|
| Model-only prompt filtering | Low integration overhead. | Bypass surface stays large; no hard stop on dangerous sinks. |
| Out-of-process policy checks (current) | Deterministic controls on execution path, independent of model persuasion quality. | Extra operational complexity: kernel availability, timeout tuning, breaker telemetry. |
| Fail-open everywhere | High availability under safety outages. | High-risk jobs can execute while guardrails are down; requires strict compensating controls. |
- - Out-of-process controls reduce risk but do not remove the need for model hardening.
- - Fail-open needs explicit governance, not wishful thinking and dashboard optimism.
- - Tool surface still needs strict scoping, approvals, and audit trails.
FAQ
Does out-of-process governance prevent every prompt injection attack?
No. It limits impact by controlling actions and data sinks even when the model is manipulated.
Why use both submit-time and pre-dispatch checks?
Submit-time blocks obvious risk early. Pre-dispatch catches drift and late-stage context changes before execution.
Should production run with POLICY_CHECK_FAIL_MODE=open?
Usually no. `open` is for controlled scenarios with strong downstream controls and clear bypass monitoring.
Next step
Run this in your staging cluster this week:
- 1. Keep `POLICY_CHECK_FAIL_MODE=closed` in production and document the exception path.
- 2. Add an alert on `cordum_scheduler_input_fail_open_total` greater than 0 over 5 minutes.
- 3. Add a dashboard panel for jobs with `safety_bypassed=true` labels.
- 4. Require human approval for any tool action that can transmit sensitive data externally.
Continue with MCP Security Risks and AI Agent Safety Kernel Outage Playbook.