Skip to content

The complete platform for governed workflows.

Intelligent scheduling. Policy-before-dispatch. Rich APIs. Pack-based extensibility. Everything you need to run autonomous operations in production with governance built in.

Intelligent Scheduling

Least-loaded worker selection with capability-aware routing.

Smart Worker Selection

  • Least-loaded scoring: Workers ranked by active_jobs + cpu_load/100 + gpu_utilization/100
  • Direct dispatch: Route to worker.{id}.jobs for targeted execution
  • Capability filtering: Match jobs to workers with required tools (kubectl, GPU, etc.)
  • Tag-based affinity: Route jobs to workers matching specific labels

Production Features

  • Queue fallback: job.* topic for worker pools without direct dispatch
  • Heartbeat tracking: Detect stale workers and redistribute jobs
  • Resource locks: Prevent concurrent access to shared resources
  • DLQ routing: Automatic dead-letter handling for failed jobs
  • Reason codes: Detailed failure reasons (no_pool_mapping, pool_overloaded, capability_mismatch)
Scheduling Request Example
POST /api/v1/jobs
{
  "pack": "infra.kubectl",
  "tool": "apply",
  "input": { "manifest": "..." },
  "context": {
    "requires": ["kubectl", "aws-credentials"],
    "preferred_worker_id": "worker-us-east-1",
    "resource_lock": "cluster:production"
  }
}

// Scheduler evaluates:
// 1. Policy check via Safety Kernel (ALLOW/DENY/REQUIRE_APPROVAL)
// 2. Worker capability matching (requires: ["kubectl", "aws-credentials"])
// 3. Least-loaded score among eligible workers
// 4. Direct dispatch to worker.<id>.jobs or fallback to job.* queue

Rich API Surface

Full control over jobs, workflows, policies, and operations.

Job Management

POST/api/v1/jobs
GET/api/v1/jobs/{id}
DEL/api/v1/jobs/{id}
GET/api/v1/jobs?status=...
POST/api/v1/jobs/{id}/cancel

Workflow Control

POST/api/v1/workflows
GET/api/v1/workflows/{id}
GET/api/v1/workflows/{id}/timeline
POST/api/v1/workflows/{id}/rerun
POST/api/v1/workflows/{id}/dry-run
POST/api/v1/approvals/{id}/decide

Policy & Packs

POST/api/v1/policy/simulate
POST/api/v1/policy/explain
POST/api/v1/policy/bundles
POST/api/v1/packs/install
POST/api/v1/packs/uninstall

Context Engine

POST/api/v1/context/build
POST/api/v1/context/update
GET/api/v1/memory?ptr=...
POST/api/v1/context/search

Config Service

GET/api/v1/config/resolved
PUT/api/v1/config/system
PUT/api/v1/config/org/{id}
PUT/api/v1/config/team/{id}
PUT/api/v1/config/workflow/{id}

Observability

GET/healthz
GET/readyz
GET/metrics
GET/api/v1/dlq/list

Advanced Features

  • Context Engine: BuildWindow/UpdateMemory for LLM context management
  • Memory pointers: GET /api/v1/memory?ptr=... for agent memory retrieval
  • Config hierarchy: System → Org → Team → Workflow → Step merge resolution
  • Run timeline: Detailed event stream for workflow execution
  • Parallel execution: Steps with multiple depends_on run when all dependencies complete

Developer Experience

  • OpenAPI spec: Auto-generated documentation for all endpoints
  • Webhook notifications: Real-time updates for job/workflow state changes
  • Pagination & filtering: List APIs support cursor-based pagination
  • Idempotency keys: Retry-safe job submission
  • DLQ management: List, retry, delete dead-letter items
Workflow API Example - Parallel Execution
POST /api/v1/workflows
{
  "steps": [
    {
      "id": "backup-db",
      "type": "worker",
      "pack": "db.postgres",
      "tool": "backup"
    },
    {
      "id": "backup-files",
      "type": "worker",
      "pack": "storage.s3",
      "tool": "snapshot"
    },
    {
      "id": "approval",
      "type": "approval",
      "depends_on": ["backup-db", "backup-files"],  // Waits for BOTH backups
      "title": "Approve migration?",
      "context": { "migration": "v2.0" }
    },
    {
      "id": "migrate",
      "type": "worker",
      "depends_on": ["approval"],
      "pack": "db.postgres",
      "tool": "migrate",
      "input": { "version": "v2.0" }
    },
    {
      "id": "notify",
      "type": "notify",
      "depends_on": ["migrate"],
      "channel": "slack",
      "message": "Migration complete"
    }
  ],
  "context": { "environment": "production" }
}

// backup-db and backup-files run in parallel
// approval waits for both to complete
// GET /api/v1/workflows/{id}/timeline for detailed execution events

Policy Engine

Policy-before-dispatch with simulate, explain, and hot-reload.

Policy Decisions

  • ALLOW: Job dispatched immediately
  • DENY: Job rejected with reason
  • REQUIRE_APPROVAL: Human gate before dispatch
  • ALLOW_WITH_CONSTRAINTS: Modified before dispatch

Advanced Features

  • Simulate API: Test policy bundles before publishing
  • Explain API: Get detailed reasoning for policy decisions
  • Approval binding: Approvals bound to policy snapshot + job hash for replay safety
  • Hot reload: Update policies without scheduler restart
  • Snapshot versioning: Rollback to previous policy states
Policy Bundle Example
{
  "name": "acme.production.v1",
  "rules": [
    {
      "id": "prod-write-approval",
      "match": {
        "tags": { "env": "production", "risk": "write" }
      },
      "decision": "REQUIRE_APPROVAL",
      "reason": "Production write operations require approval"
    },
    {
      "id": "enforce-resource-limits",
      "match": {
        "pack": "compute.*"
      },
      "decision": "ALLOW_WITH_CONSTRAINTS",
      "constraints": [
        {
          "type": "MaxTimeout",
          "value": 3600
        },
        {
          "type": "RequireIdempotencyKey"
        }
      ]
    }
  ]
}

Pack System

Declarative extensibility without core modifications.

Core Capabilities

  • Declarative manifest: Define tools, policies, dashboards in pack.json
  • MCP integration: Connect to Model Context Protocol servers
  • Schema validation: Validate tool inputs at submission time
  • Policy overlays: Per-pack rules that compose with global policies

Production Features

  • Atomic install/uninstall: All-or-nothing pack operations
  • Rollback support: Revert to previous pack versions
  • Pack tests: Test policies with policySimulations before deploy
  • Safety limits: 64 MiB max upload, 2048 max files enforced
Pack Manifest Example
{
  "name": "infra.kubectl",
  "version": "1.0.0",
  "tools": [
    {
      "name": "apply",
      "description": "Apply Kubernetes manifest",
      "inputSchema": {
        "type": "object",
        "properties": {
          "manifest": { "type": "string" },
          "namespace": { "type": "string" }
        },
        "required": ["manifest"]
      }
    }
  ],
  "mcp": {
    "serverCommand": "/packs/kubectl-mcp",
    "serverArgs": [],
    "labelConventions": {
      "env": ["prod", "staging", "dev"],
      "risk": ["write", "read", "delete"]
    }
  },
  "policies": [
    {
      "id": "kubectl-prod-approval",
      "match": { "tags": { "env": "prod" } },
      "decision": "REQUIRE_APPROVAL"
    }
  ]
}

Production Ready

Durability, observability, and operational controls out of the box.

Durability

  • JetStream persistence for NATS
  • Redis backup for state
  • Idempotency keys
  • Automatic retries with backoff

Observability

  • Prometheus metrics export
  • Structured JSON logging
  • Audit trail for all operations
  • OpenTelemetry tracing support

Operations

  • Reconciler for drift correction
  • Dead-letter queue management
  • Health/readiness probes
  • Pending replayer for stuck jobs
Observability Example
// Prometheus metrics exposed at /metrics
cordum_jobs_total{status="completed",pack="infra.kubectl"} 1247
cordum_jobs_total{status="failed",pack="db.postgres"} 3
cordum_workflow_steps_total{status="blocked",reason="approval_pending"} 12
cordum_scheduler_queue_depth{queue="job.*"} 8
cordum_policy_evaluations_total{decision="REQUIRE_APPROVAL"} 45

// Structured logs
{"level":"info","msg":"Job dispatched","job_id":"j_abc123","pack":"infra.kubectl","worker_id":"w_xyz"}
{"level":"warn","msg":"Policy denied job","job_id":"j_def456","rule_id":"prod-write-block","reason":"Matched deny rule"}
{"level":"info","msg":"Approval granted","approval_id":"a_ghi789","job_id":"j_abc123","approver":"alice@example.com"}

Developer Tools

CLI, SDKs, and local development workflow.

CLI Tool (cordum)

  • Local server: cordum serve --dev for local development
  • Pack management: cordum pack install/uninstall/list
  • Job submission: cordum job submit --pack foo --tool bar
  • Workflow execution: cordum workflow run --file workflow.json

Development Experience

  • OpenAPI spec: Full REST API documentation
  • Language SDKs: Python, TypeScript, Go clients
  • Hot reload: Update policies and packs without restart
  • Dry-run mode: Test workflows without actual execution
CLI Usage Examples
# Start local server
cordum serve --dev

# Install a pack
cordum pack install ./packs/infra-kubectl

# Submit a job
cordum job submit \
  --pack infra.kubectl \
  --tool apply \
  --input '{"manifest": "...", "namespace": "prod"}'

# Run a workflow
cordum workflow run --file deploy-pipeline.json

# Simulate policy bundle
cordum policy simulate --bundle policy.json --requests test-requests.json

# Check DLQ
cordum dlq list --limit 10

Ready to build with Cordum?

Explore how Cordum compares to alternatives, or clone the repo to start building governed workflows today.