Why autonomous ops needs a control plane
Traditional orchestrators focus on scheduling and retries. Autonomous operations add risk: automation can write to prod, patch systems, or move data across boundaries. Without governance, teams either block automation completely or ship a fragile set of scripts with no oversight.
A control plane changes the default. Every job is evaluated, every decision is logged, and approvals are explicit. That makes autonomy safe enough to use in real operations.
The boring core principle
Cordum keeps the core intentionally boring: jobs, workflows, state, policy, scheduling, retries, DLQ, approvals, and audit trail. The core does not know about GitHub, Kubernetes, or any specific tooling.
That separation is the point. It lets you upgrade the control plane without reworking domain logic, and it keeps the safety model consistent across every use case.
Control plane architecture
The control plane runs on NATS for the bus and Redis for state. The API gateway accepts jobs, the scheduler routes them, the Safety Kernel makes policy decisions, and the workflow engine coordinates runs.
Clients/UI | v API Gateway (HTTP/WS + gRPC) | writes ctx/res pointers v Redis (state, config, DLQ) | v NATS bus (sys.* + job.*) | +--> Scheduler (routing + safety gate) | | | +--> Safety Kernel (policy check) | +--> External Workers | +--> Workflow Engine (run orchestration)
Policy, approvals, and constraints
The Safety Kernel is the policy decision point. It evaluates every job and returns ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS. Decisions include a reason and are bound to a policy snapshot hash so approvals remain consistent even as policy evolves.
Constraints cap runtime, limit diffs, and enforce egress allowlists. That is how you let automation run without giving it a blank check.
Runs, steps, and audit trail
Workflows are stored in Redis. Runs emit a timeline of step states, approvals, and results. If a step fails or needs approval, the run pauses in a predictable state and resumes when conditions are met.
The audit trail is append-only. For every run you can answer: what executed, what changed, who approved, and which policy snapshot made the decision.
Packs and domain logic
Packs install workflows, schemas, and policy overlays. Install does not execute code; workers are deployed separately. This keeps upgrades safe and makes it possible to reason about what changed before it ships.
Packs are the delivery mechanism for domain logic. The core stays stable, packs evolve fast.
Operating in production
The system ships with retries, DLQ handling, and health endpoints. JetStream is optional when you need durable delivery, and Redis keeps pointer-based state to keep payloads out of the bus.
This is boring on purpose: the point is predictable behavior under pressure.
Why source-available
Enterprises want to inspect the control plane before letting it touch production. Source-available gives you transparency without forcing the project into a one-size-fits-all license model.
You can audit the source and understand every decision path. For hosted or resale use cases, commercial terms apply. See /legal/license for details.
Getting started
Request source access, then run the Docker quickstart, the smoke tests, and explore the dashboard. For production deployments, contact the team.