Skip to content
Comparison

LangChain vs LlamaIndex vs Semantic Kernel (2026)

Feature tables are everywhere. Production failure modes are not. Compare these three frameworks by what breaks when agents touch real systems.

Comparison18 min readApr 2026
TL;DR
  • -LangChain leads on ecosystem breadth and model-provider portability. LlamaIndex leads on RAG-native developer experience. Semantic Kernel leads on enterprise SDK consistency across C#, Python, and Java.
  • -All three frameworks lack native pre-dispatch policy enforcement and mandatory approval workflows. Production side effects run ungoverned unless you add an external layer.
  • -The highest-cost production failures are retry storms, state loss, and approval bypass. Not prompt syntax differences.
LangChain

Integration-first: broadest model/tool ecosystem with graph-based agent composition

LlamaIndex

Data-first: best-in-class indexing, retrieval, and RAG-native agent workflows

Semantic Kernel

Enterprise-SDK-first: multi-language plugin architecture with Azure alignment

Scope

This comparison focuses on production behavior: failure modes, state durability, governance gaps, and decision criteria for choosing between LangChain, LlamaIndex, and Semantic Kernel. It does not cover beginner tutorials or quickstart walkthroughs.

The real selection problem

Most comparison posts present feature tables. They list API surface, supported models, and quickstart complexity. That is useful for orientation but dangerous as a decision artifact. The real selection problem is not which framework has more features. It is which framework fails in a way your team can handle.

LangChain, LlamaIndex, and Semantic Kernel are not interchangeable. They solve different primary problems because they start from different design philosophies.

LangChain is integration-first. It gives you the broadest set of model providers, tool connectors, and community packages. You can swap providers and assemble agent logic quickly. The cost appears later when abstraction layers interact in unexpected ways under retry pressure.

LlamaIndex is data-first. Its center of gravity is indexing, retrieval, and query pipelines. Agent capabilities have matured, but the framework rewards teams whose core product value comes from getting the right context to the model at the right time.

Semantic Kernel is enterprise-SDK-first. It models everything as a kernel with services and plugins. Multi-language support across C#, Python, and Java gives enterprise teams SDK consistency that other frameworks do not offer. The tradeoff is a smaller community and a Python SDK that lags behind C#.

The design philosophy determines what breaks in production. Integration-first frameworks struggle when abstraction complexity compounds. Data-first frameworks carry overhead when retrieval is not central. Enterprise-SDK frameworks trade community velocity for structural discipline. Knowing where each one bends under pressure matters more than counting features.

What top sources miss

We reviewed three top-ranking comparison articles before writing. They cover API surface well. They miss production failure modes and governance gaps consistently.

SourceStrong coverageMissing piece
Turing: LangChain vs LlamaIndexClear API surface comparison with use-case mapping for retrieval versus orchestration workflows.No production failure mode analysis. No mention of governance gaps or approval bypass risks in either framework.
Medium: LangChain vs LlamaIndex vs Semantic KernelSide-by-side feature tables and ecosystem size context. Useful for initial orientation.Feature tables without production validation. No retry, state loss, or crash recovery testing across the three frameworks.
InfoWorld: Semantic Kernel vs LangChainGood enterprise positioning context for Semantic Kernel and plugin architecture advantages.Missing LlamaIndex entirely. No governance integration pattern or policy enforcement comparison.

The pattern is consistent: good orientation, weak on what happens when agents execute side effects in production without policy gates. This article fills that gap with failure mode analysis, current metrics, and governance integration patterns.

2026 framework snapshot

Table data uses current GitHub API and PyPIStats snapshots from April 2026. Community size is not a direct proxy for runtime quality, but it helps estimate ecosystem velocity and troubleshooting surface area.

FrameworkGitHub StarsPyPI DownloadsLanguagesPrimary Strength
LangChain131.7k223.8M/monthPython, TypeScriptBroadest model/tool integration and agent assembly
LlamaIndex48.2k10.09M/monthPythonRAG-native workflows and data-centric agent pipelines
Semantic Kernel27.6k2.74M/monthC#, Python, JavaEnterprise SDK model with pluggable services and orchestration

LangChain dominates in raw adoption. Its PyPI download count is roughly 22x LlamaIndex and 82x Semantic Kernel. That gap reflects ecosystem breadth and early-mover advantage, not quality ranking.

Feature comparison matrix

This matrix focuses on framework-native capabilities. A partial mark means the feature exists but needs custom engineering for strong reliability, operator control, or enterprise policy requirements.

FullPartialNot native
FeatureLangChainLlamaIndexSemantic Kernel
Multi-agent orchestration
RAG-native DX
Durable workflows
MCP support
Multi-language SDK
Plugin architecture
Built-in approval workflow
Built-in policy enforcement
Audit trail
Enterprise SDK consistency
Streaming
Memory management

The row that matters most is "Built-in policy enforcement." All three frameworks show "Not native." None of them evaluates policy before a tool call executes. If your agent can write to production databases, trigger deployments, or send emails, the framework will not stop it. You need an external governance layer.

LangChain: integration-first agent composition

Architecture

LangChain uses graph-based agent composition through LangGraph. You define nodes (agent steps), edges (transitions), and a state object that flows through the graph. The checkpointer layer adds persistence so you can resume interrupted runs. This model gives you fine-grained control over execution flow, but the graph can become hard to reason about as complexity grows.

Strengths

  • -Largest ecosystem of model providers and tool integrations. Over 700 integrations available.
  • -TypeScript and Python SDKs. Both are actively maintained with feature parity goals.
  • -LangGraph provides durable execution with checkpointing, HITL interrupts, and time-travel debugging.
  • -Fast path from idea to working agent. Swapping models or tools usually requires minimal code changes.

Production pain points

  • -Abstraction complexity compounds. Chains wrapping chains wrapping tools can make debugging opaque.
  • -Version churn has been a persistent friction point. Import paths and core APIs have changed multiple times since 2023.
  • -No native policy gates. HITL interrupts are optional and require explicit graph design. Nothing prevents a tool from firing without approval.
  • -Memory management works well for chat history but lacks framework-level token budget enforcement.

Code example

langchain_agent.py
Python
# pip install -U langchain "langchain[anthropic]" langgraph-checkpoint-postgres
from langchain.agents import create_agent
from langgraph.checkpoint.postgres import PostgresSaver


def get_ticket_status(ticket_id: str) -> str:
    return f"ticket {ticket_id} is in progress"


def escalate_ticket(ticket_id: str, reason: str) -> str:
    return f"ticket {ticket_id} escalated: {reason}"


DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres?sslmode=disable"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    agent = create_agent(
        model="claude-sonnet-4-6",
        tools=[get_ticket_status, escalate_ticket],
        system_prompt="You are an ops assistant. Check status before escalating.",
        checkpointer=checkpointer,
    )

    result = agent.invoke(
        {"messages": [{"role": "user", "content": "Status for INC-1042"}]},
        {"configurable": {"thread_id": "inc-1042"}},
    )
    print(result)

LlamaIndex: data-first agent pipelines

Architecture

LlamaIndex centers on query pipelines and AgentWorkflow with FunctionAgents. The query pipeline handles data ingestion, indexing, and retrieval. AgentWorkflow adds multi-agent coordination with typed handoffs and shared state. This design makes retrieval-heavy workflows natural but can add overhead when retrieval is not central to your use case.

Strengths

  • -Best indexing and retrieval primitives of any agent framework. Data connectors, vector store integrations, and query engines are first-class.
  • -Clean workflow abstraction. AgentWorkflow supports agent handoffs, shared state, and progressive control from simple to custom planning.
  • -Three documented multi-agent patterns: AgentWorkflow, orchestrator-as-tool, and custom planner. Teams can grow complexity without framework changes.
  • -Strong ecosystem for document connectors. Hundreds of data loaders available for PDFs, databases, APIs, and SaaS tools.

Production pain points

  • -Higher token usage in benchmarks. Published AgentRace data showed LlamaIndex using 101,772 tokens versus LangChain at 7,753 for comparable tasks.
  • -Smaller tool ecosystem compared to LangChain. If your workload is not retrieval-heavy, you may carry complexity for little benefit.
  • -Workflow state is in-memory by default. Crash recovery requires external persistence that you need to implement yourself.
  • -No native approval gate or policy enforcement. Tool execution proceeds without human checkpoint unless you build custom middleware.

Code example

llamaindex_workflow.py
Python
# pip install -U llama-index
from llama_index.core.agent.workflow import AgentWorkflow, FunctionAgent


def search_docs(topic: str) -> str:
    """Search internal documentation for a topic."""
    return f"Found 3 relevant documents about {topic}"


def write_summary(notes: str) -> str:
    """Produce a concise summary from research notes."""
    return f"Summary: {notes}"


research_agent = FunctionAgent(
    name="ResearchAgent",
    description="Collect context from documentation",
    system_prompt="Gather technical facts and handoff to WriteAgent",
    tools=[search_docs],
)

write_agent = FunctionAgent(
    name="WriteAgent",
    description="Create concise summary",
    system_prompt="Write final summary from research notes",
    tools=[write_summary],
)

workflow = AgentWorkflow(
    agents=[research_agent, write_agent],
    root_agent="ResearchAgent",
    initial_state={"notes": ""},
)

# in async context:
# response = await workflow.run(user_msg="Summarize the Q1 incident trends")

Semantic Kernel: enterprise SDK discipline

Architecture

Semantic Kernel models everything through a Kernel object with pluggable services and plugins. ChatCompletionAgent handles inference. Plugins encapsulate tool logic with typed function decorators. Group chat orchestration supports multi-agent patterns. This explicit structure makes large codebases more maintainable but requires more setup than minimal Python-only frameworks.

Strengths

  • -Multi-language SDK support across C#, Python, and Java. Teams with .NET-heavy estates can use their existing engineering standards.
  • -Plugin architecture provides clean separation between agent logic and tool implementation. Plugins are typed, testable, and reusable.
  • -Strong Azure integration. Teams already on Azure OpenAI, Azure AI Search, or Microsoft 365 get first-class connectors.
  • -More stable release cycle than LangChain. Fewer breaking changes between versions means lower migration cost.

Production pain points

  • -Smaller community. 27.6k GitHub stars versus 131.7k for LangChain. Fewer community packages, tutorials, and Stack Overflow answers.
  • -Python SDK is less mature than C#. Some features ship to C# first. Teams choosing Semantic Kernel for Python should verify feature availability.
  • -No pre-dispatch policy gates. Plugin calls execute immediately. Approval workflows need external implementation.
  • -Heavier initial setup than lightweight frameworks. The kernel-services-plugins structure adds boilerplate that slows early prototyping.

Code example

semantic_kernel_agent.py
Python
# pip install -U semantic-kernel
from semantic_kernel import Kernel
from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import kernel_function


class TicketPlugin:
    @kernel_function(description="Get current ticket status")
    def get_status(self, ticket_id: str) -> str:
        return f"Ticket {ticket_id}: in progress, assigned to on-call"

    @kernel_function(description="Escalate ticket with reason")
    def escalate(self, ticket_id: str, reason: str) -> str:
        return f"Ticket {ticket_id} escalated: {reason}"


kernel = Kernel()
kernel.add_service(AzureChatCompletion(service_id="ops-service"))
kernel.add_plugin(TicketPlugin(), plugin_name="tickets")

agent = ChatCompletionAgent(
    kernel=kernel,
    name="OpsAssistant",
    instructions="Check ticket status before escalating. Explain decisions clearly.",
)

# in async context:
# response = await agent.get_response(messages="What is the status of INC-2048?")
# print(response)

What actually breaks in production

This is the section most comparison articles skip. Feature tables tell you what a framework can do. Failure mode tables tell you what happens when things go wrong. In production, things go wrong every day.

We tested six failure scenarios across all three frameworks. The results show that governance gaps are shared across all three. The framework choice affects how failures manifest, not whether they occur.

Failure ModeLangChainLlamaIndexSemantic Kernel
Retry storm under tool failureTools may retry without backoff unless custom retry policy is wired into the graph.AgentWorkflow retries can compound token spend. Custom error handlers required.Plugin invocation retries depend on the HTTP client layer. No framework-level backoff.
State loss after crashLangGraph checkpointer persists state if configured. Without it, full state loss on restart.Workflow state is in-memory by default. Crash loses all intermediate results.ChatHistory and AgentThread are in-process. External persistence is opt-in.
Token budget overrunNo built-in token budget cap. Long chains accumulate context without hard limits.Benchmark data shows higher token usage in multi-step retrieval scenarios.No native token budget enforcement. Kernel does not cap per-invocation spend.
Approval bypass (no gate)HITL interrupts exist in LangGraph but are optional and not enforced by default.No native approval gate. Tool execution proceeds without human checkpoint.No pre-dispatch approval mechanism. Plugin calls execute immediately.
Audit trail gapLangSmith provides tracing. Self-hosted deployments need custom logging for full audit.Callback handlers capture events. Structured audit export requires custom work.Telemetry hooks exist. Complete audit narrative needs external assembly.
Migration version churnRapid release cadence. Breaking changes between minor versions have been common.Core refactors (v0.10+) changed import paths. Migration guides available but manual.More stable release cycle. Python SDK lags behind C# in feature parity.

The common thread is clear. All three frameworks assume that side effects are safe to execute once the model decides to execute them. None evaluates a policy before dispatch. None requires approval for high-risk operations by default. None provides a structured audit trail without custom integration work.

This is not a framework defect. These tools are designed for agent behavior, not governance. But the gap means that every production deployment on any of these three frameworks needs an external governance layer if agents can affect real systems.

When to use which

If your team needs a concrete recommendation this week, use this decision flow. It is intentionally blunt. The goal is to reduce architecture indecision and force explicit tradeoffs.

RAG is your core product?

Choose LlamaIndex. Its indexing, retrieval, and query pipeline primitives are purpose-built for this workload. You will spend less time fighting the framework.

Need the broadest integration ecosystem?

Choose LangChain. Over 700 integrations, active TypeScript and Python SDKs, and the largest community for troubleshooting.

.NET/C#/Java enterprise environment with Azure alignment?

Choose Semantic Kernel. Multi-language SDK consistency, plugin architecture, and first-class Azure connectors fit enterprise engineering standards.

Any of the above plus production side effects?

Add a governance layer. None of these three frameworks will prevent a dangerous tool call from executing. Policy evaluation, approval workflows, and audit trails need to come from outside the framework.

For broader framework coverage including CrewAI and AutoGen, see the 6-framework comparison and CrewAI vs AutoGen deep dive.

Why all three need a governance layer

The feature matrix makes it clear: none of these frameworks includes pre-dispatch policy enforcement. LangChain offers HITL interrupts through LangGraph, but they are optional graph nodes, not mandatory gates. LlamaIndex and Semantic Kernel provide callback hooks and telemetry, but neither evaluates whether an action should be allowed before it executes.

In production, this gap creates a specific failure pattern. The model decides to call a tool. The tool executes immediately. The side effect (a database write, an API call, a deployment trigger) happens before any policy evaluation occurs. If the action was risky, you find out after the damage.

A governance layer sits between the framework and tool execution. It intercepts the proposed action, evaluates it against a policy bundle, and returns one of four decisions: allow, deny, require approval, or allow with constraints. This pattern is framework-agnostic. The same governance endpoint works with LangChain, LlamaIndex, Semantic Kernel, or any other framework that executes tool calls.

governed_tool_call.py
Python
# Framework-agnostic pre-dispatch governance pattern
# Works with LangChain, LlamaIndex, or Semantic Kernel

import httpx


def governed_tool_call(action: str, params: dict, context: dict) -> dict:
    """Check policy before executing any side-effecting tool call."""

    decision = httpx.post(
        "http://control-plane:8080/api/v1/evaluate",
        json={
            "action": action,
            "params": params,
            "agent_id": context["agent_id"],
            "labels": context.get("labels", []),
        },
    ).json()

    if decision["result"] == "deny":
        return {"status": "blocked", "reason": decision["reason"]}

    if decision["result"] == "require_approval":
        return {"status": "pending_approval", "approval_id": decision["approval_id"]}

    # decision["result"] == "allow"
    return execute_tool(action, params)


def execute_tool(action: str, params: dict) -> dict:
    # Actual tool execution happens here
    return {"status": "executed", "action": action}

Cordum implements this pattern as an Agent Control Plane with a Safety Kernel that evaluates policy before dispatch, supports explicit approval-required states, and records an audit timeline for every governed action. It is additive to any framework, not a replacement for agent logic.

FAQ

What is the best framework: LangChain, LlamaIndex, or Semantic Kernel?

There is no universal winner. LangChain leads on ecosystem breadth and model-provider portability. LlamaIndex leads on RAG-native developer experience and data pipelines. Semantic Kernel leads on enterprise SDK consistency across C#, Python, and Java. Pick based on your dominant workload.

Is LlamaIndex better than LangChain for RAG applications?

For retrieval-heavy systems where indexing and query pipeline quality drive product value, LlamaIndex is usually the better starting point. LangChain covers RAG well but its center of gravity is broader tool orchestration, not retrieval-first design.

Should I use Semantic Kernel or LangChain?

Use Semantic Kernel when you need multi-language SDK support (C#, Python, Java), a structured plugin architecture, and Azure alignment. Use LangChain when you need the broadest integration ecosystem, TypeScript support, and fast model-provider switching.

Can I use LlamaIndex with LangChain together?

Yes. Some teams use LlamaIndex for indexing and retrieval pipelines and LangChain for broader agent orchestration. The frameworks are not mutually exclusive. The integration cost is connecting the retrieval output from LlamaIndex into a LangChain tool or chain.

Does Semantic Kernel support Python or is it C# only?

Semantic Kernel supports C#, Python, and Java. The C# SDK is the most mature. The Python SDK covers core features including agents, plugins, and chat completion, but some advanced features ship to C# first.

Which framework has the best multi-agent support?

All three support multi-agent patterns. LlamaIndex offers AgentWorkflow with FunctionAgent handoffs. LangChain uses LangGraph for graph-based agent composition. Semantic Kernel provides ChatCompletionAgent with group chat orchestration. The best fit depends on your coordination model.

Do these frameworks include built-in policy enforcement?

None of the three frameworks includes native pre-dispatch policy enforcement. You can build partial checks in application code, but teams running production side effects usually add an external governance layer for consistent policy evaluation and approval gates.

What are the main production failure modes across all three?

The six most common failure modes are: retry storms under tool failure, state loss after crash, token budget overrun, approval bypass due to missing gates, audit trail gaps, and version migration churn. All three frameworks share these risks because governance is not their primary design concern.

Is LlamaIndex slower than LangChain?

Published benchmark data (AgentRace) shows LlamaIndex using more tokens and runtime in certain multi-step scenarios. However, performance is highly sensitive to retrieval strategy, model choice, and prompt design. Run your own benchmarks on representative workloads before deciding.

When should I skip all three and use plain SDK calls?

If your task is a few model calls with no branching, no long-lived state, and no complex tool routing, plain SDK calls (like the Anthropic or OpenAI SDK directly) are cheaper to run and easier to debug than adding a full framework.

Next step

Pick one framework this week based on the decision tree. Run a bounded pilot with one real workflow. Inject failures: tool timeouts, crash recovery, token budget tests. Then add governance gates before any side-effecting action hits production.

Production reminder

Framework choice sets orchestration style. Governance decides whether risky actions are allowed to run.