Skip to content
CAP Protocol

CAP Protocol Capabilities

What exists at wire level, what gets enforced before dispatch, and why that matters when autonomous AI agents hit production.

Protocol Deep Dive14 min readUpdated Apr 2026
TL;DR
  • -CAP is a control-plane contract, not another agent framework.
  • -Its biggest practical win is deterministic decisions before dispatch, not after incident review.
  • -BusPacket + pointers make cross-language workers interoperable without giant payloads on the bus.
  • -Heartbeat + compensation metadata improve recovery behavior when long-running jobs fail mid-flight.
Real Failure Mode

Most incidents are not parser bugs. They are valid jobs dispatched without the right policy decision.

Recovery Pressure

If rollback metadata is missing at job creation time, compensation logic becomes guesswork under stress.

Operator Visibility

Progress and heartbeat signals are part of reliability. Black-box jobs are operational debt.

Scope

This article focuses on protocol surfaces you can validate in code and runtime behavior: envelope shape, state transitions, policy outcomes, approval semantics, heartbeat/progress signals, and rollback metadata.

The production problem

Multi-agent stacks now have interoperability standards for tools and delegation. That still leaves one unresolved question: which protocol surface controls execution risk before a job is published?

Teams usually bolt this on late. Then an autonomous AI agent emits a valid write request with incomplete context, no approval binding, and no compensation metadata. The protocol call succeeds. Operations fail.

What top ranking sources cover vs miss

SourceStrong coverageMissing piece
StackOne MCP vs A2AGood architectural separation between tool integration and agent coordination, including failure mode discussion.No typed governance envelope that binds policy decisions to approval and dispatch semantics.
WorkOS MCP vs A2AClear explanation of where MCP and A2A each stop in real systems.Governance remains conceptual. No explicit wire-level decision contract or rollback primitive.
DigitalOcean A2A vs MCPLayer comparison, pros/cons, and strong security framing for protocol boundaries.No production blueprint for policy check outcomes, approval binding, and result-pointer auditability.

The gap is consistent. Most articles stop at layer separation. Few explain what concrete capabilities a governance protocol must expose to be operationally useful.

Core CAP capabilities

CAP's value is not a single feature. It is the combination of typed transport, policy-enforced dispatch, pointer-based payload handling, and recovery-safe metadata.

CapabilityWhy it mattersCAP surface
Typed envelope for all bus eventsPrevents ad-hoc message drift between gateway, scheduler, and workers.BusPacket with oneof payload types
Pre-dispatch policy outcomesBlocks unsafe actions before they run, instead of only logging them after.ALLOW, DENY, REQUIRE_APPROVAL, THROTTLE, ALLOW_WITH_CONSTRAINTS
Approval bindingEnsures human approvals map to the reviewed policy snapshot and job intent.approval_required + approval_ref
Pointer-first payload handlingKeeps transport payloads small while preserving full context and result data.context_ptr, result_ptr, artifact pointers
Checkpoint heartbeatsProvides progress visibility and better operator decisions during long-running jobs.Heartbeat and JobProgress messages
Rollback metadataGives orchestrators enough data to run deterministic compensation paths.Compensation templates tied to terminal failure semantics

Wire contract details

CAP keeps one envelope (`BusPacket`) and several typed payloads. That decision alone removes a lot of schema drift between components.

buspacket.json
JSON
{
  "trace_id": "trace-ops-2026-04-01-001",
  "sender_id": "api-gateway-1",
  "created_at": "2026-04-01T11:32:10Z",
  "protocol_version": 1,
  "job_request": {
    "job_id": "job-9f0f3a",
    "topic": "job.mcp-bridge.write.update_ticket",
    "tenant_id": "default",
    "context_ptr": "redis://ctx:job-9f0f3a",
    "labels": {
      "mcp.server": "jira",
      "mcp.tool": "update_ticket",
      "mcp.action": "write"
    }
  }
}
MessagePurposeCommon fields
JobRequestDescribe the unit of work and where to load context.job_id, topic, context_ptr, labels, tenant_id
JobResultReport terminal or in-flight state back to scheduler/workflow engine.status, result_ptr, worker_id, error_code
HeartbeatReport worker health and running pressure.worker_id, cpu_load, active_jobs, pool
JobProgressExpose partial progress for long operations.percent, message, optional pointers
JobCancelSignal cooperative cancellation for in-flight work.job_id, reason, requested_by

Policy and approval flow

CAP decisions are useful because they are executable states, not advisory labels. For example, `REQUIRE_APPROVAL` means the job enters approval flow before dispatch.

safety-policy.yaml
YAML
version: v1
rules:
  - id: allow-read-path
    match:
      labels:
        mcp.action: "read"
    decision: allow
    reason: "Read operations are allowed"

  - id: approval-prod-write
    match:
      labels:
        mcp.action: "write"
      risk_tags: ["prod"]
    decision: require_approval
    reason: "Production writes require human review"

  - id: deny-destructive
    match:
      labels:
        mcp.action: "delete"
    decision: deny
    reason: "Destructive actions blocked"

  - id: constrain-heavy-jobs
    match:
      topics: ["job.batch.*"]
    decision: allow_with_constraints
    constraints:
      max_runtime_sec: 120
      max_retries: 1
approval-flow.sh
Bash
# Submit a write job that should require approval
curl -sS -X POST http://localhost:8081/api/v1/jobs   -H "Content-Type: application/json"   -d '{
    "topic":"job.mcp-bridge.write.update_ticket",
    "tenant_id":"default",
    "risk_tags":["prod"],
    "labels":{"mcp.action":"write"}
  }'

# List pending approvals
curl -sS "http://localhost:8081/api/v1/approvals?include_resolved=false"

# Approve one job after review
curl -sS -X POST "http://localhost:8081/api/v1/approvals/<job_id>/approve"   -H "Content-Type: application/json"   -d '{"note":"approved in maintenance window"}'
fatal-result-with-compensation.json
JSON
{
  "job_id": "job-9f0f3a",
  "status": "FAILED_FATAL",
  "error_code": "ERROR_CODE_JOB_TIMEOUT",
  "compensation": {
    "topic": "job.mcp-bridge.write.revert_ticket",
    "context_ptr": "redis://ctx:job-9f0f3a:undo",
    "meta": {
      "idempotency_key": "job-9f0f3a/undo"
    }
  }
}

A small but important number: safety checks run on hot paths with short client timeouts (`2s` in the current safety-kernel reference), so policy enforcement stays synchronous without turning the scheduler into a queueing bottleneck.

Limitations and tradeoffs

Protocol discipline required

Teams must keep message contracts stable across services and versions. That takes governance work.

Approval latency cost

Human gates improve safety but add waiting time for high-impact write paths.

Signal tuning effort

Heartbeat and progress signals are useful only when operators define thresholds and response runbooks.

Frequently Asked Questions

How is CAP different from MCP and A2A?
MCP standardizes tool access. A2A standardizes agent delegation. CAP standardizes governance decisions and execution lifecycle controls around jobs.
Does CAP replace MCP or A2A?
No. CAP is an additional layer for control-plane behavior. Many systems use A2A for delegation, MCP for tools, and CAP for policy decisions and auditable execution.
Why are pointers important in CAP?
Pointers keep transport traffic small and deterministic while preserving full context/results in backing storage for retries, audits, and rollback flows.
Can I run CAP decisions without approval queues?
Yes, but you lose one of CAP's main protections for high-impact writes. In production, approval gates are usually required for risky actions.
Next step

Pick one high-risk autonomous AI agent workflow and trace it end to end. Confirm BusPacket integrity, policy decision type, approval record, and compensation path in one run.

Sources