LangChain Basics in Python: Chains, Tools, and Memory

Intermediate 9 min read

What you'll learn

✓Compose prompts and chains with LCEL
✓Wire up tool calling with a structured schema
✓Add conversational memory without leaking state
✓Stream tokens to a client cleanly
✓Pick the right abstraction instead of every abstraction

Prerequisites

•Comfort with Python and pip
•Basics from [What is an LLM](/blog/what-is-an-llm)
•Function calling primer in [LLM Tool Use](/blog/llm-tool-use-and-function-calling)

LangChain has a reputation for being either magical or bloated. The truth is in the middle: the runnable primitives are useful, the rest is optional. This guide walks the parts you actually need.

Install and configure

pip install langchain langchain-openai langchain-community
export OPENAI_API_KEY=sk-...

We will use OpenAI for examples, but everything below works with Anthropic or local models by swapping the chat class.

Your first chain with LCEL

LangChain Expression Language pipes runnables together. The pipe operator is the API.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You summarize text in one sentence."),
    ("human", "{text}"),
])

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = prompt | llm | StrOutputParser()

print(chain.invoke({"text": "LangChain is a framework for building LLM apps."}))

Three primitives, one pipeline. No callbacks, no managers, no graph.

Streaming tokens

Swap invoke for stream and you get an iterator of partial outputs. This matters when latency to first token is your UX.

for chunk in chain.stream({"text": "Explain TCP in a sentence."}):
    print(chunk, end="", flush=True)

If you ship this in FastAPI, yield each chunk through a StreamingResponse. See What is FastAPI for setup.

Tool calling

Modern chat models can request tool calls with a structured argument schema. LangChain wraps this so your Python function becomes a tool the model can invoke.

from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    "Return current weather for a city."
    return f"Sunny in {city}, 24C"

llm_with_tools = llm.bind_tools([get_weather])
response = llm_with_tools.invoke("What is the weather in Tokyo?")
print(response.tool_calls)

The model returns a tool_calls list with the arg dict. You decide whether to execute and feed the result back. Do not auto-execute untrusted tool calls; whitelist by name.

A minimal tool loop

from langchain_core.messages import HumanMessage, ToolMessage

messages = [HumanMessage("What is the weather in Tokyo?")]
ai = llm_with_tools.invoke(messages)
messages.append(ai)

for call in ai.tool_calls:
    if call["name"] == "get_weather":
        result = get_weather.invoke(call["args"])
        messages.append(ToolMessage(content=result, tool_call_id=call["id"]))

final = llm_with_tools.invoke(messages)
print(final.content)

This is the agent loop in twelve lines. Frameworks add retries, parallel calls, and tracing, but the core is this.

Memory without footguns

LangChain has many memory classes. Most apps need one: a sliding window of recent turns plus a system summary.

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}
def history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are concise."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])
chat = chat_prompt | llm

with_memory = RunnableWithMessageHistory(
    chat, history,
    input_messages_key="input",
    history_messages_key="history",
)

cfg = {"configurable": {"session_id": "u123"}}
print(with_memory.invoke({"input": "My name is Mia."}, cfg).content)
print(with_memory.invoke({"input": "What is my name?"}, cfg).content)

In production, swap InMemoryChatMessageHistory for Redis or Postgres so memory survives a restart.

Plugging in retrieval

For grounded answers, retrieve documents and stuff them into the prompt. LangChain has retrievers for most vector stores, but the contract is small: a function that takes a query string and returns documents.

def retrieve(query: str) -> list[str]:
    return ["LangChain was released in October 2022."]

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using the context. Say I do not know if missing."),
    ("human", "Context:\n{context}\n\nQuestion: {q}"),
])

rag = (
    {"context": lambda x: "\n".join(retrieve(x["q"])), "q": lambda x: x["q"]}
    | rag_prompt | llm | StrOutputParser()
)

print(rag.invoke({"q": "When was LangChain released?"}))

For real retrieval, read RAG Embeddings Explained and RAG Vector Databases Overview.

When not to use LangChain

If your app is one prompt and one model call, use the provider SDK directly. LangChain shines when you have multiple steps, swap providers, or need observability via LangSmith. Pick abstractions that earn their keep.

Wrap up

LCEL gives you composition, bind_tools gives you agents, and RunnableWithMessageHistory gives you memory. Start with these three. Add the rest only when a concrete pain forces it.