Documentation

Output safety

Input policy decides whether a job can run. Output safety decides whether job results can be released after execution. This closes the gap when safe requests still return sensitive or unsafe output.

ALLOW

Result is released normally.

REDACT

Job remains successful, but response can switch to a redacted pointer.

QUARANTINE

Result is blocked and moved to OUTPUT_QUARANTINED with a DLQ record.

Scheduler flow (simplified)

Worker -> sys.job.result -> Scheduler
  -> sync CheckOutputMeta()
      ALLOW      => keep SUCCEEDED
      REDACT     => keep SUCCEEDED + redacted_ptr
      QUARANTINE => OUTPUT_QUARANTINED + DLQ(reason=output_quarantined)
  -> optional async content check
      QUARANTINE => retroactive quarantine + DLQ(reason=output_quarantined_async)

Policy model

Output rules are defined in Safety policy under output_rules. Rules can match by topic, capability, risk tags, scanners, patterns, and size limits.

output_rules example

output_rules:
  - id: out-secret-1
    decision: quarantine
    reason: "possible cloud credential in output"
    match:
      topics: ["job.*"]
      capabilities: ["code.write"]
      risk_tags: ["secrets"]
      detectors: ["secret_leak"]
      max_output_bytes: 1048576

API data shape

Job responses can include an output_safety object:

decision
reason
rule_id
findings[]
policy_snapshot
redacted_ptr
original_ptr

Operator triage

1) List quarantined jobs: GET /api/v1/jobs?state=OUTPUT_QUARANTINED
2) Inspect findings: GET /api/v1/jobs/{id}
3) Release false positives: POST /api/v1/dlq/{job_id}/retry
4) Confirm quarantine handling: DELETE /api/v1/dlq/{job_id}

Source of truth: docs/output-safety.md and Safety Kernel docs.