Skip to content
Blog

The boring core that keeps autonomy safe.

A deep dive into Cordum: governance-first orchestration for autonomous AI agents.

January 2026
12 min read
Cordum Team
Deep Dives

Control plane fundamentals.

Focused posts aligned to the keywords operators use when they evaluate workflow governance.

ComparisonReliabilityAI Agents

Temporal vs Cordum: reliability models for autonomous AI agents

A technical comparison focused on retry behavior, rollback semantics, and control-plane governance.

11 min readRead
CAPProtocolReliability

CAP protocol capabilities: reliable jobs, heartbeats, and rollback

A deep dive into CAP's core primitives - BusPacket, JobRequest, safety hooks, compensation, and checkpoint heartbeats.

10 min readRead
MCPGovernanceSecurity

How to Add Governance to Model Context Protocol (MCP) Servers

A practical governance playbook for MCP: policy checks, approvals, constraints, and audit trails.

14 min readRead
ReleaseAnnouncement

Cordum v0.1.0 Released

Announcing the first public release of Cordum - AI agent orchestration with policy enforcement and audit trails.

4 min readRead
GuideProductionAI Agents

How to Deploy AI Agents in Production: A Complete Guide

Learn how to safely deploy AI agents in production environments. Covers architecture patterns, security, monitoring, governance, and best practices.

15 min readRead
MCPTechnicalAI Agents

MCP (Model Context Protocol) Explained: Complete Guide

Learn what MCP is, how it works, and how to use it for building AI agents. Covers MCP servers, tools, resources, and integration with Claude, GPT, and other LLMs.

12 min readRead
ComparisonAI FrameworksTechnical

AI Agent Frameworks Compared (2026): LangChain vs AutoGen vs CrewAI vs Temporal

Side-by-side comparison of LangChain, AutoGen, CrewAI, Temporal, and Cordum with strengths, tradeoffs, and best-fit guidance.

10 min readRead
Best PracticesGovernanceAI Agents

Human-in-the-Loop AI: Patterns for Safe AI Agent Automation

Learn how to implement human-in-the-loop patterns for AI agents. Covers approval workflows, escalation patterns, oversight strategies, and when to require human review.

14 min readRead
SecurityAI AgentsBest Practices

AI Agent Security: Best Practices for Securing LLM-Powered Agents

Learn how to secure AI agents against prompt injection, credential leakage, and other vulnerabilities. Covers authentication, authorization, and defense-in-depth strategies.

16 min readRead
AI governancePolicy

AI governance in production: policy-first control planes

A practical guide to AI governance in production: policy checks, approvals, constraints, and audit trails.

8 min readRead
Policy-as-codeGovernance

Policy-as-code for AI agents

How to encode AI governance as versioned policy rules with explainable decisions and constraints.

7 min readRead
ApprovalsAI Agents

Approvals for autonomous AI agents: fast human gates

Design approval flows that keep humans in control without slowing AI agent response.

7 min readRead
Safety KernelLLM

LLM safety kernel: deterministic decisions for risky work

Why a Safety Kernel provides deterministic policy decisions for LLM-driven actions.

8 min readRead
Audit trailCompliance

Run audit trail: trace every AI decision end-to-end

Design an append-only audit trail for AI agents: what ran, why, who approved, and which policy applied.

6 min readRead
AI agent orchestrationAI

AI agent orchestration: DAGs, retries, governance

A blueprint for orchestrating AI agents with explicit DAGs, retries, approvals, and audit trails.

8 min readRead
TL;DR

Cordum is a control plane for autonomous AI agents. It enforces policy before work is dispatched, requires approvals for risky actions, and records a complete audit trail. The core stays stable while domain logic ships as packs.

Every job is evaluated before dispatch, and every decision is recorded.
Approvals are intentional, tied to policy snapshots and job hashes.
Domain logic lives in packs, not in the core control plane.
BUSL-1.1 means you can inspect the system without black boxes.

Why this matters: Autonomous AI agents move fast, but production systems have to stay safe. Cordum slows down only the risky parts: it evaluates every job, requires approvals when policy says so, and keeps a complete trail of what happened and why.

Why autonomous AI agents need a control plane

Traditional orchestrators focus on scheduling and retries. Autonomous AI agents add risk: automation can write to prod, patch systems, or move data across boundaries. Without governance, teams either block automation completely or ship a fragile set of scripts with no oversight.

A control plane changes the default. Every job is evaluated, every decision is logged, and approvals are explicit. That makes autonomy safe enough to use in real operations.

The boring core principle

Cordum keeps the core intentionally boring: jobs, workflows, state, policy, scheduling, retries, DLQ, approvals, and audit trail. The core does not know about GitHub, Kubernetes, or any specific tooling.

That separation is the point. It lets you upgrade the control plane without reworking domain logic, and it keeps the safety model consistent across every use case.

Control plane architecture

The control plane runs on NATS for the bus and Redis for state. The API gateway accepts jobs, the scheduler routes them, the Safety Kernel makes policy decisions, and the workflow engine coordinates runs.

Clients/UI
  |
  v
API Gateway (HTTP/WS + gRPC)
  | writes ctx/res pointers
  v
Redis (state, config, DLQ)
  |
  v
NATS bus (sys.* + job.*)
  |
  +--> Scheduler (routing + safety gate)
  |       |
  |       +--> Safety Kernel (policy check)
  |
  +--> External Workers
  |
  +--> Workflow Engine (run orchestration)

Policy, approvals, and constraints

The Safety Kernel is the policy decision point. It evaluates every job and returns ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Decisions include a reason and are bound to a policy snapshot hash so approvals remain consistent even as policy evolves.

Constraints cap runtime, limit diffs, and enforce egress allowlists. That is how you let automation run without giving it a blank check.

Runs, steps, and audit trail

Workflows are stored in Redis. Runs emit a timeline of step states, approvals, and results. If a step fails or needs approval, the run pauses in a predictable state and resumes when conditions are met.

The audit trail is append-only. For every run you can answer: what executed, what changed, who approved, and which policy snapshot made the decision.

Packs and domain logic

Packs install workflows, schemas, and policy overlays. Install does not execute code; workers are deployed separately. This keeps upgrades safe and makes it possible to reason about what changed before it ships.

Packs are the delivery mechanism for domain logic. The core stays stable, packs evolve fast.

Operating in production

The system ships with retries, DLQ handling, and health endpoints. JetStream is optional when you need durable delivery, and Redis keeps pointer-based state to keep payloads out of the bus.

This is boring on purpose: the point is predictable behavior under pressure.

Why BUSL-1.1

Enterprises want to inspect the control plane before letting it touch production. BUSL-1.1 gives you transparency without forcing the project into a one-size-fits-all license model.

You can audit the source and understand every decision path. For hosted or resale use cases, commercial terms apply. See /legal/license for details.

Getting started

Clone the repo, run the Docker quickstart, the smoke tests, and explore the dashboard. For production deployments, contact the team.

Ready to ship?

Review the source on GitHub, then run the platform locally and inspect the policy model.