LangChain became the default framework for building LLM applications in 2023 because it was first and it had everything. It also became the default framework to rip out in 2024 and 2025 because “everything” turned out to mean layers of abstraction that obscure what is actually happening, upgrade paths that break with each minor version, and debugging experiences that make simple problems feel unsolvable. The frustration is real and documented. Here is what engineers are replacing it with and why.
Analysis Briefing
- Topic: LangChain criticisms, alternatives, and LLM framework landscape in 2026
- Analyst: Mike D (@MrComputerScience)
- Context: An adversarial analysis prompted by Claude Sonnet 4.6
- Source: Pithy Cyborg | AI News Made Simple
- Key Question: What specifically breaks in LangChain that makes experienced engineers reach for something else?
What LangChain Got Wrong at the Abstraction Layer
LangChain’s core problem is that its abstractions are leaky in the direction that hurts most. When something works, the abstraction hides complexity. When something breaks, the abstraction hides the problem. Debugging a ConversationalRetrievalChain that returns wrong answers requires understanding what the chain is doing internally, and the internal behavior is not obvious from the API surface.
The chains and agents abstraction introduced a second problem: it became the way new users learned to think about LLM applications. Instead of asking “what HTTP request am I making to which API with what prompt?” they asked “which chain should I use?” This indirection made debugging feel like a framework problem rather than a prompt problem, which is almost always the wrong diagnosis.
The version churn made it worse. LangChain went through multiple breaking API changes in its first 18 months. LLMChain was deprecated in favor of LCEL (LangChain Expression Language). Agent interfaces changed. Memory implementations changed. Teams that built on an early version faced significant rewrites to stay current, with no clear payoff beyond keeping up.
# LangChain's "simple" RAG pipeline (early API)
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=ConversationBufferMemory(
memory_key="chat_history",
return_messages=True,
output_key='answer' # required but not obvious why
),
return_source_documents=True,
)
# What is this actually doing? You need to read the source to know.
# What happens when it fails? The error message is about LangChain internals.
What Engineers Are Moving To and Why
LlamaIndex replaced LangChain for RAG-first applications. LlamaIndex’s core abstraction is the index, which maps directly to how retrieval actually works. Its query pipeline is more explicit than LangChain’s chains, and its document handling (chunking, metadata extraction, node parsing) is more mature. The debugging experience is better because the pipeline stages are visible. Engineers building document Q&A, knowledge base search, or retrieval-augmented generation report fewer “why is this returning the wrong answer” debugging sessions.
DSPy is the most conceptually different alternative. Stanford’s framework treats prompt engineering as a programming problem rather than a craft. You define input and output signatures, write a program using DSPy modules, and the framework optimizes the prompts automatically using a dataset and a metric. Engineers who have adopted DSPy report that it eliminates the fragility of hand-written prompts and makes evaluation a first-class part of development rather than an afterthought. The learning curve is steeper but the resulting systems are more maintainable.
# DSPy: prompts are derived, not written by hand
import dspy
class RAGSignature(dspy.Signature):
"""Answer questions using retrieved context."""
context = dspy.InputField(desc="Retrieved documents")
question = dspy.InputField()
answer = dspy.OutputField(desc="Factual answer based only on context")
class RAGModule(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought(RAGSignature)
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
# DSPy optimizes the prompts for you using your eval dataset
optimizer = dspy.BootstrapFewShot(metric=your_metric)
optimized_module = optimizer.compile(RAGModule(), trainset=your_data)
Direct API calls with thin wrappers is what many senior engineers land on after trying frameworks. The Anthropic SDK, OpenAI SDK, and LiteLLM (for multi-provider switching) give you type-safe API calls without hiding what the request looks like. A RAG pipeline implemented as three functions (embed query, search index, call LLM with context) is fully transparent, easy to debug, and requires zero framework knowledge to maintain.
Semantic Kernel is Microsoft’s entry for teams in .NET or enterprise Python environments where Microsoft integration (Azure OpenAI, Azure AI Search) is a requirement. It is more opinionated than the alternatives and less popular in the open-source community but a reasonable choice for enterprises already in the Azure ecosystem.
The Framework That Is Actually Worth Your Time in 2026
For most applications, the correct recommendation is: start with direct API calls, add LlamaIndex if you need sophisticated retrieval, and evaluate DSPy if you find yourself spending more than 20% of your time on prompt engineering.
LangChain is not worthless. Its LCEL rewrite improved the debugging story. Its integrations library (Langchain-community) covers more third-party tools than any alternative. If you are already on LangChain and it is working, the migration cost rarely justifies the switch.
The engineers who are genuinely unhappy with LangChain share a common profile: they needed to debug deeply, hit version churn during a critical period, or built something complex enough that the abstraction layers became obstacles rather than aids. For a simple chatbot or basic RAG demo, LangChain is fine and faster to get started than building a RAG pipeline from primitives.
The broader lesson is that LLM application frameworks are still in their adolescence. The frameworks that exist in 2026 are not the ones that will dominate in 2028. Building on thin abstractions that you understand completely is more durable than betting on any specific framework’s survival.
What This Means For You
- Evaluate whether you actually need a framework before adopting one, because a RAG pipeline is three functions and a vector database, and framework overhead is only justified when the integration surface is large enough that the abstractions genuinely save time.
- Try DSPy before committing to hand-written prompts for any application where prompt quality is the primary determinant of output quality, because automated prompt optimization is consistently better than manual iteration and makes your system’s behavior measurable rather than intuited.
- Use LiteLLM as your only dependency if you need multi-provider LLM routing, because it gives you a single interface across OpenAI, Anthropic, Gemini, and local models without the overhead of a full application framework.
- Pin your LangChain version and treat minor version upgrades as potential breaking changes if you are maintaining an existing LangChain application, because the framework’s pace of change means unpinned dependencies will silently break your production pipelines.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg | AI News Made Simple → AI news made simple without hype.
