Skip to content
Blog

CAP protocol capabilities

Why the protocol matters: clean envelopes, safety hooks, deterministic rollback, and observable progress.

Jan 26, 202610 min readCAP
Envelope
BusPacket routing + signatures
Rollback
Compensation templates
Checkpoint
progress_pct + last_memo
Safety gates
Routing
Interop
TL;DR

CAP is the wire contract for Cordum's control plane. It standardizes how jobs, results, heartbeats, and safety decisions travel across the bus - so workers in any language can interoperate without custom glue.

  • - CAP is a language-agnostic control plane contract for jobs, results, heartbeats, and safety.
  • - Compensation templates and explicit failure semantics make rollback deterministic.
  • - Checkpoint heartbeats turn long tasks into observable, resumable work.
Definition

CAP (Cordum Agent Protocol) is the control-plane contract for distributed agent workloads. It specifies the envelope, job primitives, safety hooks, and observability signals so schedulers and workers can interoperate across languages and runtimes.

Cordum is the reference implementation; CAP is the transport-agnostic protocol that makes it portable.

Why CAP exists

Autonomous AI agents fail when every worker invents its own message shape. CAP fixes that by defining a stable, append-only wire schema for jobs, results, and operational signals. It keeps payloads out of the bus and makes governance enforceable across services.

Clean envelope: BusPacket

The BusPacket envelope carries metadata like trace_id, sender_id, and protocol_version. Payloads are strongly typed: job requests, results, heartbeats, alerts, progress, or cancels.

BusPacket envelope
{
  "trace_id": "trace-ops-001",
  "sender_id": "gateway-1",
  "created_at": "2026-01-26T12:00:00Z",
  "protocol_version": 1,
  "job_request": {
    "job_id": "job-123",
    "topic": "job.ops.provision",
    "context_ptr": "redis://ctx/job-123"
  }
}

Pointers over payloads

CAP keeps the bus lean by using pointers. context_ptr and result_ptr reference data stored in Redis, S3, or another store. That makes retries, audit trails, and rollbacks cheap - no multi-MB payloads in transit.

Safety hooks and governance

Every job can be evaluated by a Safety Kernel before dispatch. Decisions (allow/deny/require approval) are recorded and enforceable, and constraints can bound runtime, retries, or diff sizes.

This turns governance into a first-class protocol capability instead of a brittle middleware layer.

Compensation and rollback

CAP supports compensation templates directly on the job request. When a workflow fails fatally, the orchestrator can dispatch the inverse action without decoding the original payload.

JobRequest + compensation
{
  "job_id": "job-123",
  "topic": "job.ops.provision",
  "context_ptr": "redis://ctx/job-123",
  "compensation": {
    "topic": "job.ops.deprovision",
    "context_ptr": "redis://ctx/job-123/undo",
    "priority": "JOB_PRIORITY_CRITICAL",
    "meta": {
      "idempotency_key": "job-123/undo",
      "capability": "ops.deprovision"
    }
  }
}

The compensation block is structured and typed, so workers know exactly what rollback action to execute.

Checkpoint heartbeats

Heartbeats are no longer just liveness signals. Workers can send progress_pct and last_memo to report checkpoints for long-running tasks.

WorkerHeartbeat
{
  "worker_id": "worker-ops-1",
  "pool": "job.ops.provision",
  "active_jobs": 2,
  "max_parallel_jobs": 6,
  "cpu_load": 21.4,
  "progress_pct": 65,
  "last_memo": "iam_policy_applied"
}

Retryable vs fatal failure

CAP distinguishes FAILED_RETRYABLE from FAILED_FATAL. That gives schedulers and orchestrators a deterministic signal to retry transient errors or trigger saga rollback when recovery is unsafe.

JobResult status
JobResult.status:
  JOB_STATUS_FAILED_RETRYABLE  // transient error, safe to retry
  JOB_STATUS_FAILED_FATAL      // fatal error, trigger rollback

Interoperability and signatures

CAP is language-agnostic. Go, Python, and Node SDKs serialize the same protobuf schema and optionally sign envelopes with ECDSA. That lets you mix workers across stacks without translation layers.

Adoption checklist

  • - Keep payloads off the bus and store them behind pointers.
  • - Emit heartbeats every 2-5 seconds with pool + capacity.
  • - Use compensation templates for critical side effects.
  • - Treat FAILED_RETRYABLE as transient; FAILED_FATAL as rollback-worthy.
  • - Sign envelopes if you need authenticity guarantees.

Build on CAP

CAP is open and append-only. If you need reliable orchestration with explicit governance, start here.

Related reading

View all