Documentation
Output safety
Input policy decides whether a job can run. Output safety decides whether job results can be released after execution. This closes the gap when safe requests still return sensitive or unsafe output.
ALLOW
Result is released normally.
REDACT
Job remains successful, but response can switch to a redacted pointer.
QUARANTINE
Result is blocked and moved to OUTPUT_QUARANTINED with a DLQ record.
Scheduler flow (simplified)
Worker -> sys.job.result -> Scheduler
-> sync CheckOutputMeta()
ALLOW => keep SUCCEEDED
REDACT => keep SUCCEEDED + redacted_ptr
QUARANTINE => OUTPUT_QUARANTINED + DLQ(reason=output_quarantined)
-> optional async content check
QUARANTINE => retroactive quarantine + DLQ(reason=output_quarantined_async)Policy model
Output rules are defined in Safety policy under output_rules. Rules can match by topic, capability, risk tags, scanners, patterns, and size limits.
output_rules example
output_rules:
- id: out-secret-1
decision: quarantine
reason: "possible cloud credential in output"
match:
topics: ["job.*"]
capabilities: ["code.write"]
risk_tags: ["secrets"]
detectors: ["secret_leak"]
max_output_bytes: 1048576API data shape
Job responses can include an output_safety object:
- decision
- reason
- rule_id
- findings[]
- policy_snapshot
- redacted_ptr
- original_ptr
Operator triage
- 1) List quarantined jobs:
GET /api/v1/jobs?state=OUTPUT_QUARANTINED - 2) Inspect findings:
GET /api/v1/jobs/{id} - 3) Release false positives:
POST /api/v1/dlq/{job_id}/retry - 4) Confirm quarantine handling:
DELETE /api/v1/dlq/{job_id}
Source of truth: docs/output-safety.md and Safety Kernel docs.
