LangChain Basics in Python: Chains, Tools, and Memory
Learn LangChain by building real components in Python: prompt templates, chains, tool calling, and memory. Practical patterns you can ship today.
What you'll learn
- ✓Compose prompts and chains with LCEL
- ✓Wire up tool calling with a structured schema
- ✓Add conversational memory without leaking state
- ✓Stream tokens to a client cleanly
- ✓Pick the right abstraction instead of every abstraction
Prerequisites
- •Comfort with Python and pip
- •Basics from [What is an LLM](/blog/what-is-an-llm)
- •Function calling primer in [LLM Tool Use](/blog/llm-tool-use-and-function-calling)
LangChain has a reputation for being either magical or bloated. The truth is in the middle: the runnable primitives are useful, the rest is optional. This guide walks the parts you actually need.
Install and configure
pip install langchain langchain-openai langchain-community
export OPENAI_API_KEY=sk-...
We will use OpenAI for examples, but everything below works with Anthropic or local models by swapping the chat class.
Your first chain with LCEL
LangChain Expression Language pipes runnables together. The pipe operator is the API.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
("system", "You summarize text in one sentence."),
("human", "{text}"),
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm | StrOutputParser()
print(chain.invoke({"text": "LangChain is a framework for building LLM apps."}))
Three primitives, one pipeline. No callbacks, no managers, no graph.
Streaming tokens
Swap invoke for stream and you get an iterator of partial outputs. This matters when latency to first token is your UX.
for chunk in chain.stream({"text": "Explain TCP in a sentence."}):
print(chunk, end="", flush=True)
If you ship this in FastAPI, yield each chunk through a StreamingResponse. See What is FastAPI for setup.
Tool calling
Modern chat models can request tool calls with a structured argument schema. LangChain wraps this so your Python function becomes a tool the model can invoke.
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"Return current weather for a city."
return f"Sunny in {city}, 24C"
llm_with_tools = llm.bind_tools([get_weather])
response = llm_with_tools.invoke("What is the weather in Tokyo?")
print(response.tool_calls)
The model returns a tool_calls list with the arg dict. You decide whether to execute and feed the result back. Do not auto-execute untrusted tool calls; whitelist by name.
A minimal tool loop
from langchain_core.messages import HumanMessage, ToolMessage
messages = [HumanMessage("What is the weather in Tokyo?")]
ai = llm_with_tools.invoke(messages)
messages.append(ai)
for call in ai.tool_calls:
if call["name"] == "get_weather":
result = get_weather.invoke(call["args"])
messages.append(ToolMessage(content=result, tool_call_id=call["id"]))
final = llm_with_tools.invoke(messages)
print(final.content)
This is the agent loop in twelve lines. Frameworks add retries, parallel calls, and tracing, but the core is this.
Memory without footguns
LangChain has many memory classes. Most apps need one: a sliding window of recent turns plus a system summary.
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
store = {}
def history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are concise."),
("placeholder", "{history}"),
("human", "{input}"),
])
chat = chat_prompt | llm
with_memory = RunnableWithMessageHistory(
chat, history,
input_messages_key="input",
history_messages_key="history",
)
cfg = {"configurable": {"session_id": "u123"}}
print(with_memory.invoke({"input": "My name is Mia."}, cfg).content)
print(with_memory.invoke({"input": "What is my name?"}, cfg).content)
In production, swap InMemoryChatMessageHistory for Redis or Postgres so memory survives a restart.
Plugging in retrieval
For grounded answers, retrieve documents and stuff them into the prompt. LangChain has retrievers for most vector stores, but the contract is small: a function that takes a query string and returns documents.
def retrieve(query: str) -> list[str]:
return ["LangChain was released in October 2022."]
rag_prompt = ChatPromptTemplate.from_messages([
("system", "Answer using the context. Say I do not know if missing."),
("human", "Context:\n{context}\n\nQuestion: {q}"),
])
rag = (
{"context": lambda x: "\n".join(retrieve(x["q"])), "q": lambda x: x["q"]}
| rag_prompt | llm | StrOutputParser()
)
print(rag.invoke({"q": "When was LangChain released?"}))
For real retrieval, read RAG Embeddings Explained and RAG Vector Databases Overview.
When not to use LangChain
If your app is one prompt and one model call, use the provider SDK directly. LangChain shines when you have multiple steps, swap providers, or need observability via LangSmith. Pick abstractions that earn their keep.
Wrap up
LCEL gives you composition, bind_tools gives you agents, and RunnableWithMessageHistory gives you memory. Start with these three. Add the rest only when a concrete pain forces it.