When to use Cordum.
Concrete scenarios where a control plane solves real problems — from governing AI agents in production to meeting compliance requirements for autonomous systems.
AI Agent Governance in Production
Autonomous LLM agents take actions with no pre-execution checks. A misclassified intent can delete production data, expose secrets, or trigger costly cloud operations before anyone reviews the decision.
The Safety Kernel evaluates every agent action against YAML policies before execution. Actions are ALLOW, DENY, REQUIRE_APPROVAL, or ALLOW_WITH_CONSTRAINTS — all logged to an immutable audit trail.
- Pre-dispatch policy enforcement in < 5ms
- Human approval gates for high-risk actions
- Constraint enforcement (deny-paths, budgets, max runtime)
- Immutable policy snapshots with hash versioning
rules:
- name: block-prod-delete
match:
risk_tags: [prod, write]
capability: database
decision: DENY
reason: "Direct production deletes require manual execution"
- name: require-approval-deploy
match:
risk_tags: [prod]
capability: deploy
decision: REQUIRE_APPROVAL
reason: "Production deployments need human sign-off"Incident Response Automation
Incident workflows require coordinating multiple systems (PagerDuty, Kubernetes, Datadog) with human approval at critical steps. Manual coordination is slow and error-prone under pressure.
DAG workflows orchestrate multi-step incident response with built-in approval gates, parallel execution of independent steps, and automatic rollback on failure.
- DAG-based workflow with parallel step execution
- Approval steps for escalation decisions
- Automatic rollback via compensation templates
- Full timeline for post-incident review
name: incident-response
steps:
diagnose:
type: worker
topic: job.sre.diagnose
notify:
type: notify
message: "Incident diagnosed, awaiting approval"
approve-remediation:
type: approval
depends_on: [diagnose, notify]
remediate:
type: worker
topic: job.sre.remediate
depends_on: [approve-remediation]Infrastructure Automation with Safety
Running Terraform, Ansible, or kubectl through AI agents without guardrails risks unintended infrastructure changes. A single misconfigured command can take down production.
Cordum constrains infrastructure actions with deny-path patterns, max runtime limits, and required rollback plans. The Safety Kernel blocks dangerous operations before they execute.
- deny_paths blocks access to sensitive namespaces
- max_runtime prevents runaway operations
- require_rollback_plan for destructive changes
- Capability-based routing to specialized workers
rules:
- name: kubectl-constraints
match:
capability: kubectl
decision: ALLOW_WITH_CONSTRAINTS
constraints:
deny_paths: ["/kube-system/*", "/istio-system/*"]
max_runtime: 300s
- name: terraform-approval
match:
capability: terraform
risk_tags: [prod]
decision: REQUIRE_APPROVALMulti-Agent Coordination
Multiple AI agents running independently create race conditions, duplicate work, and conflicting actions. No shared governance means no way to enforce ordering or mutual exclusion.
The Workflow Engine executes DAGs with deterministic step ordering. Distributed locks prevent concurrent mutations. The scheduler routes jobs to specialized worker pools based on capabilities.
- DAG execution with depends_on for ordering
- Distributed locks (shared/exclusive) for mutual exclusion
- Capability-based routing (requires: kubectl, GPU)
- Parallel step execution for independent work
name: multi-agent-pipeline
steps:
analyze:
type: llm
topic: job.analyst.analyze
plan-a:
type: worker
topic: job.planner.infrastructure
depends_on: [analyze]
plan-b:
type: worker
topic: job.planner.application
depends_on: [analyze]
execute:
type: worker
topic: job.executor.apply
depends_on: [plan-a, plan-b]Compliance & Audit for AI
Regulatory frameworks (SOC 2, ISO 27001, GDPR) require proof that AI systems have controls. Post-hoc logging is insufficient — auditors need evidence of pre-execution governance.
Every action generates an immutable audit record before execution. Policy snapshots are versioned with SHA256 hashes. The append-only timeline provides a complete chain of custody for every workflow run.
- Append-only timeline for every run
- Policy snapshot hashes for audit provenance
- Decision audit records (trace_id, job_id, decision, reason)
- DLQ captures all failures for review
{
"trace_id": "tr-8f2a1b3c",
"job_id": "job-4e5f6a7b",
"decision": "REQUIRE_APPROVAL",
"reason": "Production write detected",
"policy_snapshot": "sha256:a1b2c3d4...",
"capability": "database.write",
"risk_tags": ["prod", "write"],
"timestamp": "2025-03-15T10:30:00Z",
"approved_by": "ops-lead@company.io",
"approved_at": "2025-03-15T10:32:15Z"
}CI/CD with Policy Gates
Deployment pipelines lack runtime policy checks. Static rules in CI config files cannot adapt to the current state of production or enforce dynamic constraints based on context.
Add Cordum as a policy gate in your deployment pipeline. The Safety Kernel evaluates deployments against current policy, enforcing constraints like deployment windows, required approvals, and change size limits.
- Dynamic policy evaluation at deploy time
- max_diff_size limits for change control
- Deployment window enforcement via policy rules
- Simulate API for testing policy changes pre-merge
# In your CI pipeline
cordumctl job submit \
--topic job.deploy.production \
--capability deploy \
--risk-tags prod,write \
--context '{"image": "app:v2.1.0"}' \
--wait
# Safety Kernel evaluates:
# → ALLOW_WITH_CONSTRAINTS
# max_diff_size: 500
# require_rollback_plan: true