Skip to content
Blog

Workflow orchestration for AI

DAG-first orchestration with retries, approvals, and audit trails that keep AI safe in production.

Jan 24, 20268 min readWorkflow orchestration
TL;DR

AI workflows need explicit DAGs, deterministic state, and governance-aware orchestration. Approvals and policy checks must be steps, not side systems.

  • - AI workflows need explicit DAGs and deterministic state transitions.
  • - Approvals and policy checks must be first-class orchestration steps.
  • - Retries and DLQ handling keep workflows resilient under pressure.
Orchestration goal

The orchestrator should be boring and deterministic. AI belongs in bounded workers, while the workflow engine handles retries, approvals, and audit trails.

Why AI workflows need orchestration

AI workflows include external tools, human approvals, and multi-step reasoning. Without orchestration, failures become silent and retries become ad-hoc scripts.

Deterministic orchestration

The core must be deterministic even when the AI is not. Orchestration is about explicit state transitions, governed decisions, and predictable retries. AI belongs in bounded workers, not inside the scheduler.

DAG design for AI work

Make every step explicit and track its state in a run timeline.

name: incident-triage
input_schema: IncidentContext
steps:
  triage:
    type: worker
    topic: job.incident.enrich
  summarize:
    type: worker
    topic: job.incident.summarize
    depends_on: [triage]
  approval:
    type: approval
    depends_on: [summarize]
    reason: "Prod write detected"
  remediate:
    type: worker
    topic: job.incident.remediate
    depends_on: [approval]
    constraints:
      max_lines_changed: 500
      max_runtime_sec: 900
  closeout:
    type: notify
    depends_on: [remediate]

Governance hooks

Governance is not a plugin. The orchestrator should call a policy decision point before dispatch and pause runs that require approval.

Failure handling and retries

Use retries for transient failures and DLQ handling for poison messages. The goal is predictable behavior under pressure.

Why not Temporal or Airflow?

Temporal and Airflow are strong orchestrators, but they do not ship governance primitives by default. Cordum adds a safety gate, approvals, and audit trail as first-class steps.

  • - Temporal: great at retries and state, lacks built-in policy decision points.
  • - Airflow: great for data pipelines, not designed for high-risk AI actions.
  • - Cordum: governance-first orchestration with approvals and constraints.

How Cordum orchestrates

Cordum uses a workflow engine to coordinate runs, a Safety Kernel to evaluate every job, and an append-only audit trail to record outcomes. NATS and Redis provide durable state and routing.

See the Workflow Engine overview for more details.

Next steps

Request source access to review the workflow engine design, or run the quickstart to orchestrate your first run.