Skip to content
Blog

AI governance in production

Policy-first controls for AI workflows, with approvals and audit trails that stand up to scrutiny.

Jan 14, 20268 min readAI governance
TL;DR

Governance is not a spreadsheet or a ticket queue. It is a decision layer that evaluates every job before it runs, requires approval when risk is high, and records a complete audit trail.

  • - Governance is a decision layer before work runs, not a compliance afterthought.
  • - Approvals and constraints keep risky actions safe without blocking the safe path.
  • - An append-only audit trail is the evidence layer for security and ops.
Why governance matters

AI workflows can touch production systems, change data, or trigger network access. Governance must be deterministic and automated, otherwise teams will either block AI or accept hidden risk.

The safest pattern is a control plane that evaluates every job, makes an explicit decision, and records it with an audit trail. That is the baseline for AI operations.

Why we built this

We kept seeing the same failure pattern: AI workflows shipped fast, but approvals and audit trails stayed manual. When incidents happened, teams could not answer who approved what or why a job ran at all.

Governance has to be a decision layer, not an afterthought. That belief is the foundation of Cordum.

What AI governance means

AI governance is the set of controls that decide whether a job should run, under what constraints, and who must approve it. It is not a post-hoc review. It is an inline policy decision point that runs before work is dispatched.

In practice, that means a deterministic core with explicit workflows and bounded workers. The model can be probabilistic, but the control plane cannot be.

Before and after governance

Before
  • - Jobs dispatch directly from scripts or automation.
  • - Approvals happen in chat, not tied to policy.
  • - Incident timelines are reconstructed by hand.
After
  • - Every job is evaluated by policy before dispatch.
  • - Approvals bind to policy snapshot and job hash.
  • - An append-only audit trail answers what happened.

Incident response walkthrough

A real incident workflow should feel boring: alert triggers a run, evidence is collected, policy gates risky remediation, and the audit trail records every step.

  1. 1. Alert triggers the incident-triage workflow.
  2. 2. Triage worker gathers logs and context.
  3. 3. Safety Kernel evaluates the remediation step.
  4. 4. Approval gate pauses the run until reviewed.
  5. 5. Remediation executes with constraints applied.
  6. 6. Timeline records the decision, approval, and outcome.

Common failure modes

  • - Ungoverned actions that write to production without a policy check.
  • - Approvals that are not tied to a policy snapshot or job hash.
  • - No audit trail, so incidents become guesswork.
  • - Shadow automation where teams ship scripts outside the control plane.

Policy-first control plane

A policy-first control plane evaluates every job with a Safety Kernel. The policy decision returns ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Constraints cap runtime, limit diffs, and enforce network allowlists so risky actions are bounded.

This is where governance becomes real. It is not just a document. It is a decision that blocks, gates, or constrains execution.

See Safety Kernel and policy docs for details.

Governed control plane architecture

The Safety Kernel sits between dispatch and execution, ensuring every job gets a deterministic decision before it runs.

Clients/UI
  |
  v
API Gateway (HTTP/WS + gRPC)
  | writes ctx/res pointers
  v
Redis (state, config, DLQ)
  |
  v
NATS bus (sys.* + job.*)
  |
  +--> Scheduler (routing + safety gate)
  |       |
  |       +--> Safety Kernel (policy check)
  |
  +--> External Workers
  |
  +--> Workflow Engine (run orchestration)

Governance checklist

  • - Evaluate every job before dispatch, not after the fact.
  • - Bind approvals to policy snapshot and job hash.
  • - Enforce constraints for runtime, diffs, and egress.
  • - Record an append-only audit trail for every run and decision.
  • - Make the safe path fast and the risky path gated.

How Cordum helps

Cordum provides a Safety Kernel that evaluates every job, an approvals system that binds decisions to policy snapshots, and a run timeline that records results. Workflows are explicit DAGs and the audit trail is append-only.

If you need to prove governance to security and compliance teams, start by requesting source access and reviewing the control plane implementation.

Next steps

Request source access to inspect the governance flow end-to-end, or review the quickstart to run it locally.