Why CAP exists
Autonomous AI agents fail when every worker invents its own message shape. CAP fixes that by defining a stable, append-only wire schema for jobs, results, and operational signals. It keeps payloads out of the bus and makes governance enforceable across services.
Clean envelope: BusPacket
The BusPacket envelope carries metadata like trace_id, sender_id, and protocol_version. Payloads are strongly typed: job requests, results, heartbeats, alerts, progress, or cancels.
{
"trace_id": "trace-ops-001",
"sender_id": "gateway-1",
"created_at": "2026-01-26T12:00:00Z",
"protocol_version": 1,
"job_request": {
"job_id": "job-123",
"topic": "job.ops.provision",
"context_ptr": "redis://ctx/job-123"
}
}Pointers over payloads
CAP keeps the bus lean by using pointers. context_ptr and result_ptr reference data stored in Redis, S3, or another store. That makes retries, audit trails, and rollbacks cheap - no multi-MB payloads in transit.
Safety hooks and governance
Every job can be evaluated by a Safety Kernel before dispatch. Decisions (allow/deny/require approval) are recorded and enforceable, and constraints can bound runtime, retries, or diff sizes.
This turns governance into a first-class protocol capability instead of a brittle middleware layer.
Compensation and rollback
CAP supports compensation templates directly on the job request. When a workflow fails fatally, the orchestrator can dispatch the inverse action without decoding the original payload.
{
"job_id": "job-123",
"topic": "job.ops.provision",
"context_ptr": "redis://ctx/job-123",
"compensation": {
"topic": "job.ops.deprovision",
"context_ptr": "redis://ctx/job-123/undo",
"priority": "JOB_PRIORITY_CRITICAL",
"meta": {
"idempotency_key": "job-123/undo",
"capability": "ops.deprovision"
}
}
}The compensation block is structured and typed, so workers know exactly what rollback action to execute.
Checkpoint heartbeats
Heartbeats are no longer just liveness signals. Workers can send progress_pct and last_memo to report checkpoints for long-running tasks.
{
"worker_id": "worker-ops-1",
"pool": "job.ops.provision",
"active_jobs": 2,
"max_parallel_jobs": 6,
"cpu_load": 21.4,
"progress_pct": 65,
"last_memo": "iam_policy_applied"
}Retryable vs fatal failure
CAP distinguishes FAILED_RETRYABLE from FAILED_FATAL. That gives schedulers and orchestrators a deterministic signal to retry transient errors or trigger saga rollback when recovery is unsafe.
JobResult.status: JOB_STATUS_FAILED_RETRYABLE // transient error, safe to retry JOB_STATUS_FAILED_FATAL // fatal error, trigger rollback
Interoperability and signatures
CAP is language-agnostic. Go, Python, and Node SDKs serialize the same protobuf schema and optionally sign envelopes with ECDSA. That lets you mix workers across stacks without translation layers.
Adoption checklist
- - Keep payloads off the bus and store them behind pointers.
- - Emit heartbeats every 2-5 seconds with pool + capacity.
- - Use compensation templates for critical side effects.
- - Treat FAILED_RETRYABLE as transient; FAILED_FATAL as rollback-worthy.
- - Sign envelopes if you need authenticity guarantees.
