The production problem
“LangGraph vs Temporal” looks like a fair comparison until an incident hits production. Then the missing layer appears: who decides whether the agent is allowed to run the action at all.
LangGraph gives expressive agent flow. Temporal gives reliability under failure. Neither is a policy decision point by default. If your agent modifies customer-facing systems, this gap becomes the incident.
One-line rule
If your approval model is “someone will notice in Slack,” you do not have an approval model.
What top sources cover vs miss
| Source | Strong coverage | Missing piece |
|---|---|---|
| LangGraph Durable Execution Docs | Clear requirements for checkpointers, thread IDs, determinism, and idempotent task boundaries during replay. | Does not define policy-gated execution for high-risk external actions across organizational boundaries. |
| LangGraph Persistence Docs | Detailed model for threads, checkpoints, super-steps, and production checkpointer backends. | No guidance for pre-dispatch approval workflows or immutable multi-system governance audit requirements. |
| Temporal Durable Execution Technical Guide | Strong explanation of completion guarantees, retries, signals/queries, and long-running workflow behavior. | No built-in AI-specific policy decision model (`ALLOW`, `DENY`, `REQUIRE_APPROVAL`) before tool or API side effects. |
This guide fills that gap with a layer model, explicit ownership boundaries, and code-level integration points.
Three layers, three jobs
| Layer | Owner | Responsibility | Common failure |
|---|---|---|---|
| Reasoning layer | Applied AI team | Prompting, tool selection, agent graph behavior | Agent loops or brittle branch logic |
| Execution layer | Platform team | Retries, resumability, timeouts, idempotent activity calls | Stuck workflows and duplicate side effects |
| Governance layer | Security + platform | Policy checks, approval gates, auditability, output safety | Unapproved prod actions and weak incident forensics |
Side-by-side comparison
| Dimension | LangGraph | Temporal | Cordum |
|---|---|---|---|
| Primary concern | Agent control flow and state transitions | Durable execution and failure recovery | Governance, policy, and approvals |
| Unit of orchestration | Graph nodes and edges | Workflow + activities | Jobs + policy checks + workflow steps |
| Long-running reliability | Depends on persistence setup | Core guarantee | Job state, scheduler reconciliation, DLQ |
| Pre-execution policy | Custom | Custom | Built-in Safety Kernel decisions |
| Human approval routing | Custom interrupt handling | Custom signal + workflow logic | First-class `REQUIRE_APPROVAL` and approval state |
| Audit trail | State checkpoints | Workflow event history | Policy snapshot + decision timeline + job history |
| Best fit | Rapid agent behavior development | Business-critical process durability | Regulated or high-impact agent actions |
Reference architecture
Computes diagnosis and proposes actions. No side effects yet.
Runs durable workflow, retries transient failures, and resumes after outages.
Checks policy before execution, routes approvals, and records audit timeline.
Keep policy decisions outside agent prompt logic. Prompt changes should not silently change risk posture.
Working code patterns
Pattern: graph proposes, durable workflow orchestrates, policy gate decides, then side effects execute.
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.postgres import PostgresSaver
class AgentState(TypedDict):
ticket_id: str
summary: str
proposed_action: str
def analyze(state: AgentState):
# LLM call or retrieval logic
return {"summary": "Root cause likely config drift"}
def propose(state: AgentState):
return {"proposed_action": "restart_service:payments-api"}
builder = StateGraph(AgentState)
builder.add_node("analyze", analyze)
builder.add_node("propose", propose)
builder.add_edge(START, "analyze")
builder.add_edge("analyze", "propose")
builder.add_edge("propose", END)
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
graph = builder.compile(checkpointer=checkpointer)
result = graph.invoke(
{"ticket_id": "INC-441", "summary": "", "proposed_action": ""},
config={"configurable": {"thread_id": "inc-441-thread"}},
)import { proxyActivities } from "@temporalio/workflow";
const { runLangGraphStep, cordumPolicyCheck, executeRemediation, verifyRemediation } =
proxyActivities<{
runLangGraphStep(input: unknown): Promise<unknown>;
cordumPolicyCheck(input: unknown): Promise<{ decision: string }>;
executeRemediation(input: unknown): Promise<void>;
verifyRemediation(input: unknown): Promise<{ passed: boolean }>;
}>({ startToCloseTimeout: "2 minute" });
export async function IncidentWorkflow(input: { incidentId: string }) {
const proposal = await runLangGraphStep(input);
const policy = await cordumPolicyCheck(proposal);
if (policy.decision === "DENY") {
return { status: "blocked_by_policy" };
}
if (policy.decision === "REQUIRE_APPROVAL") {
// wait for external approval signal handled by workflow code
// omitted here for brevity
}
await executeRemediation(proposal);
const verification = await verifyRemediation({ incidentId: input.incidentId });
if (!verification.passed) {
throw new Error("verification_failed");
}
return { status: "resolved" };
}version: v1
rules:
- id: deny-destructive-shell-prod
match:
topic: "job.exec.shell"
labels:
env: prod
command_class: destructive
decision: DENY
- id: approval-required-prod-remediation
match:
topic: "job.incident.remediate"
labels:
env: prod
risk_tier: high
decision: REQUIRE_APPROVAL
- id: constrained-medium-remediation
match:
topic: "job.incident.remediate"
labels:
risk_tier: medium
decision: ALLOW_WITH_CONSTRAINTS
constraints:
max_runtime_sec: 180
allowed_namespaces: ["prod-edge", "staging"]For a deeper production checklist, pair this with the AI agent deployment checklist.
Limitations and tradeoffs
More moving parts
Three layers add operational complexity. You need clear ownership, not shared ambiguity.
Replay discipline required
Durable systems require determinism boundaries. Side effects must be isolated to avoid duplicate impact.
Approval latency
Strong governance can slow urgent changes if risk tiers are too broad. Review queue metrics every week.