AI governance in production

Why we built this

We kept seeing the same failure pattern: AI workflows shipped fast, but approvals and audit trails stayed manual. When incidents happened, teams could not answer who approved what or why a job ran at all.

Governance has to be a decision layer, not an afterthought. That belief is the foundation of Cordum.

What AI governance means

AI governance is the set of controls that decide whether a job should run, under what constraints, and who must approve it. It is not a post-hoc review. It is an inline policy decision point that runs before work is dispatched.

In practice, that means a deterministic core with explicit workflows and bounded workers. The model can be probabilistic, but the control plane cannot be.

Before and after governance

Before

- Jobs dispatch directly from scripts or automation.
- Approvals happen in chat, not tied to policy.
- Incident timelines are reconstructed by hand.

After

- Every job is evaluated by policy before dispatch.
- Approvals bind to policy snapshot and job hash.
- An append-only audit trail answers what happened.

Incident response walkthrough

A real incident workflow should feel boring: alert triggers a run, evidence is collected, policy gates risky remediation, and the audit trail records every step.

1. Alert triggers the incident-triage workflow.
2. Triage worker gathers logs and context.
3. Safety Kernel evaluates the remediation step.
4. Approval gate pauses the run until reviewed.
5. Remediation executes with constraints applied.
6. Timeline records the decision, approval, and outcome.

Common failure modes

- Ungoverned actions that write to production without a policy check.
- Approvals that are not tied to a policy snapshot or job hash.
- No audit trail, so incidents become guesswork.
- Shadow automation where teams ship scripts outside the control plane.

Policy-first control plane

A policy-first control plane evaluates every job with a Safety Kernel. The policy decision returns ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Constraints cap runtime, limit diffs, and enforce network allowlists so risky actions are bounded.

This is where governance becomes real. It is not just a document. It is a decision that blocks, gates, or constrains execution.

See Safety Kernel and policy docs for details.

Governed control plane architecture

The Safety Kernel sits between dispatch and execution, ensuring every job gets a deterministic decision before it runs.

Clients/UI
  |
  v
API Gateway (HTTP/WS + gRPC)
  | writes ctx/res pointers
  v
Redis (state, config, DLQ)
  |
  v
NATS bus (sys.* + job.*)
  |
  +--> Scheduler (routing + safety gate)
  |       |
  |       +--> Safety Kernel (policy check)
  |
  +--> External Workers
  |
  +--> Workflow Engine (run orchestration)

Governance checklist

- Evaluate every job before dispatch, not after the fact.
- Bind approvals to policy snapshot and job hash.
- Enforce constraints for runtime, diffs, and egress.
- Record an append-only audit trail for every run and decision.
- Make the safe path fast and the risky path gated.

How Cordum helps

Cordum provides a Safety Kernel that evaluates every job, an approvals system that binds decisions to policy snapshots, and a run timeline that records results. Workflows are explicit DAGs and the audit trail is append-only.

If you need to prove governance to security and compliance teams, start by reviewing the control plane implementation on GitHub.