Skip to content
Glossary

Fail-Open vs Fail-Closed

Fail-open and fail-closed describe what a governance system does when its policy check is unavailable: fail-open lets actions proceed without a decision, while fail-closed blocks actions until policy can be evaluated.

Definition

Fail-open and fail-closed describe what a governance system does when its policy check is unavailable: fail-open lets actions proceed without a decision, while fail-closed blocks actions until policy can be evaluated.

The core trade-off

When the policy decision point times out or is unreachable, the system must choose a default. Fail-open prioritizes availability — agents keep working even if they cannot be checked — but it means risky actions can slip through ungoverned during the outage. Fail-closed prioritizes safety — no action proceeds without a decision — at the cost of availability during the same outage. There is no universally correct answer; the right default depends on the blast radius of the actions being governed.

Choosing per risk tier

Mature deployments do not pick one mode globally. They fail-closed on high-risk actions — deletions, payments, production changes — so a policy outage can never permit an irreversible operation ungoverned, while allowing lower-risk actions to fail-open to preserve throughput. Cordum supports this by treating the Safety Kernel as a hot-path dependency with explicit timeouts and a circuit breaker, and by alerting on the rate of fail-open events so an outage that quietly widens the ungoverned window is caught early.

Frequently asked questions

Should an agent governance system fail open or fail closed?

It depends on blast radius. Fail-closed for high-risk, irreversible actions so an outage never lets them through ungoverned; fail-open for low-risk actions where availability matters more. The dangerous choice is failing open silently on everything.

Why alert on fail-open events?

A fail-open during an outage means actions ran without a policy decision. Tracking the rate of these events — and alerting when it spikes — turns a silent governance gap into a visible signal you can respond to.

Related reading

Govern your AI agents with Cordum

Cordum is the agent control plane: policy-before-dispatch enforcement, human approvals, and a tamper-evident audit trail for autonomous AI agents.