Workflow orchestration for AI: DAGs, retries, governance

Why AI workflows need orchestration

AI workflows include external tools, human approvals, and multi-step reasoning. Without orchestration, failures become silent and retries become ad-hoc scripts.

Deterministic orchestration

The core must be deterministic even when the AI is not. Orchestration is about explicit state transitions, governed decisions, and predictable retries. AI belongs in bounded workers, not inside the scheduler.

DAG design for AI work

Make every step explicit and track its state in a run timeline.

name: incident-triage
input_schema: IncidentContext
steps:
  triage:
    type: worker
    topic: job.incident.enrich
  summarize:
    type: worker
    topic: job.incident.summarize
    depends_on: [triage]
  approval:
    type: approval
    depends_on: [summarize]
    reason: "Prod write detected"
  remediate:
    type: worker
    topic: job.incident.remediate
    depends_on: [approval]
    constraints:
      max_lines_changed: 500
      max_runtime_sec: 900
  closeout:
    type: notify
    depends_on: [remediate]

Governance hooks

Governance is not a plugin. The orchestrator should call a policy decision point before dispatch and pause runs that require approval.

Failure handling and retries

Use retries for transient failures and DLQ handling for poison messages. The goal is predictable behavior under pressure.

Why not Temporal or Airflow?

Temporal and Airflow are strong orchestrators, but they do not ship governance primitives by default. Cordum adds a safety gate, approvals, and audit trail as first-class steps.

- Temporal: great at retries and state, lacks built-in policy decision points.
- Airflow: great for data pipelines, not designed for high-risk AI actions.
- Cordum: governance-first orchestration with approvals and constraints.

How Cordum orchestrates

Cordum uses a workflow engine to coordinate runs, a Safety Kernel to evaluate every job, and an append-only audit trail to record outcomes. NATS and Redis provide durable state and routing.

See the Workflow Engine overview for more details.