Skip to content
Documentation

Architecture

Cordum is an Agent Control Plane built around a gateway, scheduler, Safety Kernel, workflow engine, and CAP v2 messages on NATS. Redis stores state, pointers, config, workflow data, and indexes.

System diagram
IngressState + BusPolicyExecution
Clients / Dashboard
API Gateway
Redis + NATS
Workflow Engine + Scheduler + Workers
Context Engine + Safety Kernel
Solid lines show the main request and orchestration path. Dashed lines show workers reusing Redis-backed pointers for context and results.

Core components

API Gateway

HTTP, WebSocket, and gRPC entrypoint for jobs, workflows, approvals, config, policy bundles, DLQ, artifacts, locks, packs, workers, and traces.

Scheduler

Consumes submit, result, cancel, and heartbeat subjects; evaluates pre-dispatch policy; routes jobs; persists job state; and manages DLQ/reconciliation.

Safety Kernel

gRPC policy decision point with Check, Evaluate, Explain, Simulate, snapshot tracking, constraints, remediations, and optional decision caching.

Workflow Engine

Stores workflow definitions and runs in Redis, advances DAG steps, and keeps append-only run timelines.

Context Engine

Optional gRPC service for BuildWindow and UpdateMemory over Redis-backed memory keys.

External Workers

User-provided workers subscribe to job topics or direct worker subjects, hydrate pointers, execute work, and publish CAP JobResult packets.

Licensing

Three-tier entitlement system (Community, Team, Enterprise) in core/licensing/. Enforces worker limits, RPS caps, and feature gates at the gateway and scheduler.

Telemetry

Privacy-first anonymous usage collection. Defaults to local_only mode; opt-in anonymous mode shares aggregate metrics without PII.

Job lifecycle

Current control-plane flow
1. The gateway validates auth and tenant, then writes input to ctx:<job_id>.
2. Submit-time policy runs in the gateway before state is persisted or the bus is used.
3. The gateway publishes BusPacket{JobRequest} to sys.job.submit.
4. The scheduler sets PENDING, evaluates pre-dispatch policy, resolves routing, and dispatches.
5. The worker hydrates context_ptr, executes, writes res:<job_id>, and publishes JobResult.
6. The scheduler finalizes state, records result_ptr, applies DLQ rules, and optionally stores output-safety metadata.
7. Workflow runs advance from job results and maintain an append-only timeline.
Gateway and scheduler each have their own safety-unavailable behavior: the gateway usesGATEWAY_POLICY_FAIL_MODE at submit time; the scheduler usesPOLICY_CHECK_FAIL_MODE at dispatch time.
NATS subjects
  • sys.job.submit
  • sys.job.result
  • sys.job.progress
  • sys.job.dlq
  • sys.job.cancel
  • sys.heartbeat
  • sys.workflow.event
  • job.*
  • worker.<id>.jobs
Redis keys
  • ctx:<job_id> / res:<job_id> / art:<id>
  • job:meta:<job_id> / job:state:<job_id> / job:index:<state>
  • job:events:<job_id> / trace:<trace_id>
  • wf:def:<workflow_id> / wf:run:<run_id>
  • wf:run:timeline:<run_id> / wf:run:idempotency:<key>
  • cfg:<scope>:<id> / cfg:system:policy / cfg:system:packs
  • schema:<id> / schema:index / dlq:index
  • mem:<memory_id>:events / chunks / summary
  • license:current / license:usage
  • telemetry:consent / telemetry:buffer

Protocol and boundaries

Bus traffic uses CAP v2 types fromgithub.com/cordum-io/cap/v2/cordum/agent/v1. Licensing and telemetry ship in the core repo; advanced auth features (SAML/SSO, multi-tenant RBAC) are kept incordum-enterprise. The CAP wire contract and SDKs live in the separateCAP documentation.

Related guides