Comparison

LangChain vs LlamaIndex vs Semantic Kernel (2026)

Compare LangChain, LlamaIndex, and Semantic Kernel by use case, RAG fit, enterprise SDK support, failure modes, and governance gaps before agents touch real systems.

Comparison18 min readApr 2026

LangChain vs LlamaIndex vs Semantic Kernel: quick comparison

LangChain is best for broad tool and model integrations, LlamaIndex is best for document-heavy RAG pipelines, and Semantic Kernel is best for C#, Java, and .NET enterprise agent teams. All three still need external policy checks before production tool calls.

Short answer for 2026

LangChain leads in ecosystem breadth, LlamaIndex in RAG-native retrieval, and Semantic Kernel in enterprise SDK consistency.The right choice for 2026 depends on your workload: LangChain for its 700+ integrations and LangGraph orchestration, LlamaIndex for document-heavy agent pipelines, or Semantic Kernel for C#/Java alignment and Azure-grade SDK discipline. However, all three lack the native policy enforcement and approval gates necessary for secure production agents.

All three lack pre-dispatch policy enforcement. None evaluates a tool call against a policy before it executes. None requires approval gates by default. None ships a structured audit trail without custom integration. The framework choice decides orchestration style; production safety needs an external governance layer regardless of which framework you pick.

The decision below is operational, not cosmetic. Pick LlamaIndex if RAG is your core product. Pick LangChain if you need the broadest tool ecosystem and fast model swaps. Pick Semantic Kernel if you need .NET/Java SDK consistency or are already on Azure. Then add a control plane for policy, approvals, and audit before agents touch real systems.

Decision matrix

Which framework should you choose?

Use this as the quick answer before the deeper comparison: the winner depends on workload, but governed execution is required whenever agents can take side-effecting actions.

Document-heavy RAG product

LlamaIndex

Indexing, retrieval, and query pipeline primitives are the product center, not an add-on.

Tool-heavy agent apps and fast provider swaps

LangChain

Broad integrations, Python/TypeScript coverage, and LangGraph composition reduce integration time.

.NET, C#, Java, or Azure-aligned enterprise stack

Semantic Kernel

Multi-language SDKs and plugin structure fit enterprise engineering standards.

Any workflow that mutates production systems

Framework + control plane

The framework orchestrates work; external policy, approvals, and audit govern execution.

TL;DR

-LangChain leads on ecosystem breadth and model-provider portability. LlamaIndex leads on RAG-native developer experience. Semantic Kernel leads on enterprise SDK consistency across C#, Python, and Java.
-All three frameworks lack native pre-dispatch policy enforcement and mandatory approval workflows. Production side effects run ungoverned unless you add an external layer.
-The highest-cost production failures are retry storms, state loss, and approval bypass. Not prompt syntax differences.

LangChain

Integration-first: broadest model/tool ecosystem with graph-based agent composition

LlamaIndex

Data-first: best-in-class indexing, retrieval, and RAG-native agent workflows

Semantic Kernel

Enterprise-SDK-first: multi-language plugin architecture with Azure alignment

On this page

Short answer for 2026
The real selection problem
What top sources miss
2026 framework snapshot
Feature comparison matrix
LangChain deep dive
LlamaIndex deep dive
Semantic Kernel deep dive
Production failure modes
When to use which
Why all three need governance
FAQ
Next step

Scope

This comparison focuses on production behavior: failure modes, state durability, governance gaps, and decision criteria for choosing between LangChain, LlamaIndex, and Semantic Kernel. It does not cover beginner tutorials or quickstart walkthroughs.

The real selection problem

Most comparison posts present feature tables. They list API surface, supported models, and quickstart complexity. That is useful for orientation but dangerous as a decision artifact. The real selection problem is not which framework has more features. It is which framework fails in a way your team can handle.

LangChain, LlamaIndex, and Semantic Kernel are not interchangeable. They solve different primary problems because they start from different design philosophies.

LangChain is integration-first. It gives you the broadest set of model providers, tool connectors, and community packages. You can swap providers and assemble agent logic quickly. The cost appears later when abstraction layers interact in unexpected ways under retry pressure.

LlamaIndex is data-first. Its center of gravity is indexing, retrieval, and query pipelines. Agent capabilities have matured, but the framework rewards teams whose core product value comes from getting the right context to the model at the right time.

Semantic Kernel is enterprise-SDK-first. It models everything as a kernel with services and plugins. Multi-language support across C#, Python, and Java gives enterprise teams SDK consistency that other frameworks do not offer. The tradeoff is a smaller community and a Python SDK that lags behind C#.

The design philosophy determines what breaks in production. Integration-first frameworks struggle when abstraction complexity compounds. Data-first frameworks carry overhead when retrieval is not central. Enterprise-SDK frameworks trade community velocity for structural discipline. Knowing where each one bends under pressure matters more than counting features.

What top sources miss

We reviewed three top-ranking comparison articles before writing. They cover API surface well. They miss production failure modes and governance gaps consistently.

Source	Strong coverage	Missing piece
Turing: LangChain vs LlamaIndex	Clear API surface comparison with use-case mapping for retrieval versus orchestration workflows.	No production failure mode analysis. No mention of governance gaps or approval bypass risks in either framework.
Medium: LangChain vs LlamaIndex vs Semantic Kernel	Side-by-side feature tables and ecosystem size context. Useful for initial orientation.	Feature tables without production validation. No retry, state loss, or crash recovery testing across the three frameworks.
InfoWorld: Semantic Kernel vs LangChain	Good enterprise positioning context for Semantic Kernel and plugin architecture advantages.	Missing LlamaIndex entirely. No governance integration pattern or policy enforcement comparison.

The pattern is consistent: good orientation, weak on what happens when agents execute side effects in production without policy gates. This article fills that gap with failure mode analysis, current metrics, and governance integration patterns.

2026 framework snapshot

Table data uses current GitHub API and PyPIStats snapshots from April 2026. Community size is not a direct proxy for runtime quality, but it helps estimate ecosystem velocity and troubleshooting surface area.

Framework	GitHub Stars	PyPI Downloads	Languages	Primary Strength
LangChain	131.7k	223.8M/month	Python, TypeScript	Broadest model/tool integration and agent assembly
LlamaIndex	48.2k	10.09M/month	Python	RAG-native workflows and data-centric agent pipelines
Semantic Kernel	27.6k	2.74M/month	C#, Python, Java	Enterprise SDK model with pluggable services and orchestration

LangChain dominates in raw adoption. Its PyPI download count is roughly 22x LlamaIndex and 82x Semantic Kernel. That gap reflects ecosystem breadth and early-mover advantage, not quality ranking.

Feature comparison matrix

This matrix focuses on framework-native capabilities. A partial mark means the feature exists but needs custom engineering for strong reliability, operator control, or enterprise policy requirements.

FullPartialNot native

Feature	LangChain	LlamaIndex	Semantic Kernel
Multi-agent orchestration
RAG-native DX
Durable workflows
MCP support
Multi-language SDK
Plugin architecture
Built-in approval workflow
Built-in policy enforcement
Audit trail
Enterprise SDK consistency
Streaming
Memory management

The row that matters most is "Built-in policy enforcement." All three frameworks show "Not native." None of them evaluates policy before a tool call executes. If your agent can write to production databases, trigger deployments, or send emails, the framework will not stop it. You need an external governance layer.

LangChain: integration-first agent composition

Architecture

LangChain uses graph-based agent composition through LangGraph. You define nodes (agent steps), edges (transitions), and a state object that flows through the graph. The checkpointer layer adds persistence so you can resume interrupted runs. This model gives you fine-grained control over execution flow, but the graph can become hard to reason about as complexity grows.

Strengths

-Largest ecosystem of model providers and tool integrations. Over 700 integrations available.
-TypeScript and Python SDKs. Both are actively maintained with feature parity goals.
-LangGraph provides durable execution with checkpointing, HITL interrupts, and time-travel debugging.
-Fast path from idea to working agent. Swapping models or tools usually requires minimal code changes.

Production pain points

-Abstraction complexity compounds. Chains wrapping chains wrapping tools can make debugging opaque.
-Version churn has been a persistent friction point. Import paths and core APIs have changed multiple times since 2023.
-No native policy gates. HITL interrupts are optional and require explicit graph design. Nothing prevents a tool from firing without approval.
-Memory management works well for chat history but lacks framework-level token budget enforcement.

Code example

langchain_agent.py

Python

# pip install -U langchain "langchain[anthropic]" langgraph-checkpoint-postgres
from langchain.agents import create_agent
from langgraph.checkpoint.postgres import PostgresSaver


def get_ticket_status(ticket_id: str) -> str:
    return f"ticket {ticket_id} is in progress"


def escalate_ticket(ticket_id: str, reason: str) -> str:
    return f"ticket {ticket_id} escalated: {reason}"


DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres?sslmode=disable"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    agent = create_agent(
        model="claude-sonnet-4-6",
        tools=[get_ticket_status, escalate_ticket],
        system_prompt="You are an ops assistant. Check status before escalating.",
        checkpointer=checkpointer,
    )

    result = agent.invoke(
        {"messages": [{"role": "user", "content": "Status for INC-1042"}]},
        {"configurable": {"thread_id": "inc-1042"}},
    )
    print(result)

LlamaIndex: data-first agent pipelines

Architecture

LlamaIndex centers on query pipelines and AgentWorkflow with FunctionAgents. The query pipeline handles data ingestion, indexing, and retrieval. AgentWorkflow adds multi-agent coordination with typed handoffs and shared state. This design makes retrieval-heavy workflows natural but can add overhead when retrieval is not central to your use case.

Strengths

-Best indexing and retrieval primitives of any agent framework. Data connectors, vector store integrations, and query engines are first-class.
-Clean workflow abstraction. AgentWorkflow supports agent handoffs, shared state, and progressive control from simple to custom planning.
-Three documented multi-agent patterns: AgentWorkflow, orchestrator-as-tool, and custom planner. Teams can grow complexity without framework changes.
-Strong ecosystem for document connectors. Hundreds of data loaders available for PDFs, databases, APIs, and SaaS tools.

Production pain points

-Higher token usage in benchmarks. Published AgentRace data showed LlamaIndex using 101,772 tokens versus LangChain at 7,753 for comparable tasks.
-Smaller tool ecosystem compared to LangChain. If your workload is not retrieval-heavy, you may carry complexity for little benefit.
-Workflow state is in-memory by default. Crash recovery requires external persistence that you need to implement yourself.
-No native approval gate or policy enforcement. Tool execution proceeds without human checkpoint unless you build custom middleware.

Code example

llamaindex_workflow.py

Python

# pip install -U llama-index
from llama_index.core.agent.workflow import AgentWorkflow, FunctionAgent


def search_docs(topic: str) -> str:
    """Search internal documentation for a topic."""
    return f"Found 3 relevant documents about {topic}"


def write_summary(notes: str) -> str:
    """Produce a concise summary from research notes."""
    return f"Summary: {notes}"


research_agent = FunctionAgent(
    name="ResearchAgent",
    description="Collect context from documentation",
    system_prompt="Gather technical facts and handoff to WriteAgent",
    tools=[search_docs],
)

write_agent = FunctionAgent(
    name="WriteAgent",
    description="Create concise summary",
    system_prompt="Write final summary from research notes",
    tools=[write_summary],
)

workflow = AgentWorkflow(
    agents=[research_agent, write_agent],
    root_agent="ResearchAgent",
    initial_state={"notes": ""},
)

# in async context:
# response = await workflow.run(user_msg="Summarize the Q1 incident trends")

Semantic Kernel: enterprise SDK discipline

Architecture

Semantic Kernel models everything through a Kernel object with pluggable services and plugins. ChatCompletionAgent handles inference. Plugins encapsulate tool logic with typed function decorators. Group chat orchestration supports multi-agent patterns. This explicit structure makes large codebases more maintainable but requires more setup than minimal Python-only frameworks.

Strengths

-Multi-language SDK support across C#, Python, and Java. Teams with .NET-heavy estates can use their existing engineering standards.
-Plugin architecture provides clean separation between agent logic and tool implementation. Plugins are typed, testable, and reusable.
-Strong Azure integration. Teams already on Azure OpenAI, Azure AI Search, or Microsoft 365 get first-class connectors.
-More stable release cycle than LangChain. Fewer breaking changes between versions means lower migration cost.

Production pain points

-Smaller community. 27.6k GitHub stars versus 131.7k for LangChain. Fewer community packages, tutorials, and Stack Overflow answers.
-Python SDK is less mature than C#. Some features ship to C# first. Teams choosing Semantic Kernel for Python should verify feature availability.
-No pre-dispatch policy gates. Plugin calls execute immediately. Approval workflows need external implementation.
-Heavier initial setup than lightweight frameworks. The kernel-services-plugins structure adds boilerplate that slows early prototyping.

Code example

semantic_kernel_agent.py

Python

# pip install -U semantic-kernel
from semantic_kernel import Kernel
from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import kernel_function


class TicketPlugin:
    @kernel_function(description="Get current ticket status")
    def get_status(self, ticket_id: str) -> str:
        return f"Ticket {ticket_id}: in progress, assigned to on-call"

    @kernel_function(description="Escalate ticket with reason")
    def escalate(self, ticket_id: str, reason: str) -> str:
        return f"Ticket {ticket_id} escalated: {reason}"


kernel = Kernel()
kernel.add_service(AzureChatCompletion(service_id="ops-service"))
kernel.add_plugin(TicketPlugin(), plugin_name="tickets")

agent = ChatCompletionAgent(
    kernel=kernel,
    name="OpsAssistant",
    instructions="Check ticket status before escalating. Explain decisions clearly.",
)

# in async context:
# response = await agent.get_response(messages="What is the status of INC-2048?")
# print(response)

What actually breaks in production

This is the section most comparison articles skip. Feature tables tell you what a framework can do. Failure mode tables tell you what happens when things go wrong. In production, things go wrong every day.

We tested six failure scenarios across all three frameworks. The results show that governance gaps are shared across all three. The framework choice affects how failures manifest, not whether they occur.

Failure Mode	LangChain	LlamaIndex	Semantic Kernel
Retry storm under tool failure	Tools may retry without backoff unless custom retry policy is wired into the graph.	AgentWorkflow retries can compound token spend. Custom error handlers required.	Plugin invocation retries depend on the HTTP client layer. No framework-level backoff.
State loss after crash	LangGraph checkpointer persists state if configured. Without it, full state loss on restart.	Workflow state is in-memory by default. Crash loses all intermediate results.	ChatHistory and AgentThread are in-process. External persistence is opt-in.
Token budget overrun	No built-in token budget cap. Long chains accumulate context without hard limits.	Benchmark data shows higher token usage in multi-step retrieval scenarios.	No native token budget enforcement. Kernel does not cap per-invocation spend.
Approval bypass (no gate)	HITL interrupts exist in LangGraph but are optional and not enforced by default.	No native approval gate. Tool execution proceeds without human checkpoint.	No pre-dispatch approval mechanism. Plugin calls execute immediately.
Audit trail gap	LangSmith provides tracing. Self-hosted deployments need custom logging for full audit.	Callback handlers capture events. Structured audit export requires custom work.	Telemetry hooks exist. Complete audit narrative needs external assembly.
Migration version churn	Rapid release cadence. Breaking changes between minor versions have been common.	Core refactors (v0.10+) changed import paths. Migration guides available but manual.	More stable release cycle. Python SDK lags behind C# in feature parity.

The common thread is clear. All three frameworks assume that side effects are safe to execute once the model decides to execute them. None evaluates a policy before dispatch. None requires approval for high-risk operations by default. None provides a structured audit trail without custom integration work.

This is not a framework defect. These tools are designed for agent behavior, not governance. But the gap means that every production deployment on any of these three frameworks needs an external governance layer if agents can affect real systems.

When to use which

If your team needs a concrete recommendation this week, use this decision flow. It is intentionally blunt. The goal is to reduce architecture indecision and force explicit tradeoffs.

RAG is your core product?

Choose LlamaIndex. Its indexing, retrieval, and query pipeline primitives are purpose-built for this workload. You will spend less time fighting the framework.

Need the broadest integration ecosystem?

Choose LangChain. Over 700 integrations, active TypeScript and Python SDKs, and the largest community for troubleshooting.

.NET/C#/Java enterprise environment with Azure alignment?

Choose Semantic Kernel. Multi-language SDK consistency, plugin architecture, and first-class Azure connectors fit enterprise engineering standards.

Any of the above plus production side effects?

Add a governance layer. None of these three frameworks will prevent a dangerous tool call from executing. Policy evaluation, approval workflows, and audit trails need to come from outside the framework.

For broader framework coverage including CrewAI and AutoGen, see the 6-framework comparison and CrewAI vs AutoGen deep dive.

Why all three need a governance layer

The feature matrix makes it clear: none of these frameworks includes pre-dispatch policy enforcement. LangChain offers HITL interrupts through LangGraph, but they are optional graph nodes, not mandatory gates. LlamaIndex and Semantic Kernel provide callback hooks and telemetry, but neither evaluates whether an action should be allowed before it executes.

In production, this gap creates a specific failure pattern. The model decides to call a tool. The tool executes immediately. The side effect (a database write, an API call, a deployment trigger) happens before any policy evaluation occurs. If the action was risky, you find out after the damage.

A governance layer sits between the framework and tool execution. It intercepts the proposed action, evaluates it against a policy bundle, and returns one of four decisions: allow, deny, require approval, or allow with constraints. This pattern is framework-agnostic. The same governance endpoint works with LangChain, LlamaIndex, Semantic Kernel, or any other framework that executes tool calls.

governed_tool_call.py

Python

# Framework-agnostic pre-dispatch governance pattern
# Works with LangChain, LlamaIndex, or Semantic Kernel

import httpx


def governed_tool_call(action: str, params: dict, context: dict) -> dict:
    """Check policy before executing any side-effecting tool call."""

    decision = httpx.post(
        "http://control-plane:8080/api/v1/evaluate",
        json={
            "action": action,
            "params": params,
            "agent_id": context["agent_id"],
            "labels": context.get("labels", []),
        },
    ).json()

    if decision["result"] == "deny":
        return {"status": "blocked", "reason": decision["reason"]}

    if decision["result"] == "require_approval":
        return {"status": "pending_approval", "approval_id": decision["approval_id"]}

    # decision["result"] == "allow"
    return execute_tool(action, params)


def execute_tool(action: str, params: dict) -> dict:
    # Actual tool execution happens here
    return {"status": "executed", "action": action}

Cordum implements this pattern as an Agent Control Plane with a Safety Kernel that evaluates policy before dispatch, supports explicit approval-required states, and records an audit timeline for every governed action. It is additive to any framework, not a replacement for agent logic.

For the architecture behind that layer, read what an AI agent control plane is. For the governance program that owns the policies, see agentic AI governance. For approval design, use the human-in-the-loop AI patterns.

FAQ

What is the difference between LangChain and LlamaIndex?

LangChain is a general-purpose agent orchestration framework with broad tool integrations, whereas LlamaIndex specializes in data ingestion, indexing, and RAG pipelines for document-heavy workloads.

What is the best framework: LangChain, LlamaIndex, or Semantic Kernel?

There is no universal winner. LangChain leads on ecosystem breadth and model-provider portability. LlamaIndex leads on RAG-native developer experience and data pipelines. Semantic Kernel leads on enterprise SDK consistency across C#, Python, and Java. Pick based on your dominant workload.

Is LlamaIndex better than LangChain for RAG applications?

For retrieval-heavy systems where indexing and query pipeline quality drive product value, LlamaIndex is usually the better starting point. LangChain covers RAG well but its center of gravity is broader tool orchestration, not retrieval-first design.

Should I use Semantic Kernel or LangChain?

Use Semantic Kernel when you need multi-language SDK support (C#, Python, Java), a structured plugin architecture, and Azure alignment. Use LangChain when you need the broadest integration ecosystem, TypeScript support, and fast model-provider switching.

Can I use LlamaIndex with LangChain together?

Yes. Some teams use LlamaIndex for indexing and retrieval pipelines and LangChain for broader agent orchestration. The frameworks are not mutually exclusive. The integration cost is connecting the retrieval output from LlamaIndex into a LangChain tool or chain.

Does Semantic Kernel support Python or is it C# only?

Semantic Kernel supports C#, Python, and Java. The C# SDK is the most mature. The Python SDK covers core features including agents, plugins, and chat completion, but some advanced features ship to C# first.

Which framework has the best multi-agent support?

All three support multi-agent patterns. LlamaIndex offers AgentWorkflow with FunctionAgent handoffs. LangChain uses LangGraph for graph-based agent composition. Semantic Kernel provides ChatCompletionAgent with group chat orchestration. The best fit depends on your coordination model.

Do these frameworks include built-in policy enforcement?

None of the three frameworks includes native pre-dispatch policy enforcement. You can build partial checks in application code, but teams running production side effects usually add an external governance layer for consistent policy evaluation and approval gates.

How should production teams compare LangChain vs LlamaIndex vs Semantic Kernel?

Compare them on workload first: LangChain for broad tool ecosystems, LlamaIndex for retrieval-heavy RAG, and Semantic Kernel for .NET, Java, and Azure-aligned enterprise stacks. Then evaluate the same production controls for all three: pre-dispatch policy, human approvals, constraints, crash recovery, and audit evidence.

What are the main production failure modes across all three?

The six most common failure modes are: retry storms under tool failure, state loss after crash, token budget overrun, approval bypass due to missing gates, audit trail gaps, and version migration churn. All three frameworks share these risks because governance is not their primary design concern.

Is LlamaIndex slower than LangChain?

Published benchmark data (AgentRace) shows LlamaIndex using more tokens and runtime in certain multi-step scenarios. However, performance is highly sensitive to retrieval strategy, model choice, and prompt design. Run your own benchmarks on representative workloads before deciding.

When should I skip all three and use plain SDK calls?

If your task is a few model calls with no branching, no long-lived state, and no complex tool routing, plain SDK calls (like the Anthropic or OpenAI SDK directly) are cheaper to run and easier to debug than adding a full framework.

Next step

Pick one framework this week based on the decision tree. Run a bounded pilot with one real workflow. Inject failures: tool timeouts, crash recovery, token budget tests. Then add governance gates before any side-effecting action hits production.

Agentic AI Governance: What It Means and How to Implement It (2026)

Agentic AI governance is the control layer for autonomous agents that act, decide, and delegate independently. Learn the architecture, decision model, and implementation patterns.

What Is an AI Agent Control Plane? Definition and Architecture (2026)

An AI agent control plane is the governance layer that manages policy decisions, approvals, and audit trails across autonomous agent fleets. Learn the architecture and why frameworks alone are not enough.

Best AI Agent Frameworks 2026: LangChain, CrewAI, AutoGen

Compare LangChain, CrewAI, AutoGen, LlamaIndex, and Semantic Kernel by use case, failure mode, governance gap, durability, and audit readiness.

Production reminder

Framework choice sets orchestration style. Governance decides whether risky actions are allowed to run.

Quickstart Architecture View on GitHub