Skip to content
C Codeloom
Prompt Engineering

Prompt Engineering: The ReAct Pattern

Learn the ReAct pattern, a prompting technique that combines reasoning and action to build effective tool-using LLM agents.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • What ReAct is and why it works
  • The Thought-Action-Observation loop
  • How to write a ReAct prompt
  • When ReAct beats plain chain-of-thought
  • Common pitfalls and how to fix them

Prerequisites

  • Familiar with APIs
  • Basic prompting

What and Why

ReAct stands for “Reasoning and Acting.” It is a prompting pattern introduced in a 2022 paper that interleaves natural language reasoning with concrete tool calls. Instead of asking the model to either think silently or just call a tool, ReAct asks it to alternate between thinking out loud and acting on the world, observing the result, and thinking again.

The pattern works because it gives the model a structured way to plan, use external information, and self-correct. Plain chain-of-thought is great for closed problems but fails when the answer requires fresh data, computation, or a system query. ReAct lets the model reach outside its weights without losing its train of thought.

Mental Model

Think of a person solving a crossword with a dictionary on the table. They look at a clue, mutter a guess (thought), reach for the dictionary to check a word (action), read the entry (observation), and update their guess. Repeat until done. The thinking and the looking-up are interleaved, not separated.

ReAct codifies this into a transcript the LLM produces: Thought: ... Action: tool(args) Observation: ... repeating until a Final Answer.

Hands-on Example

Here is a minimal ReAct prompt for an agent with two tools, search and calculator:

You have access to:
- search(query): web search
- calc(expression): math

Use this format:
Thought: reason about what to do
Action: tool_name(args)
Observation: tool result
... (repeat) ...
Thought: I now know the answer
Final Answer: ...

Question: What is the population of Tokyo divided by the population of Paris?

A typical model trace looks like:

Thought: I need both populations.
Action: search("population of Tokyo 2025")
Observation: ~13.96 million
Thought: Now Paris.
Action: search("population of Paris 2025")
Observation: ~2.10 million
Thought: Divide.
Action: calc("13.96 / 2.10")
Observation: 6.65
Final Answer: About 6.65.

+--------+
| Thought|<-----+
+--------+      |
     |          |
     v          |
+--------+      |
| Action |      |
+--------+      |
     |          |
     v          |
+--------+      |
| Observ.|------+
+--------+
     |
     v
Final Answer
The ReAct loop: thought, action, observation, repeat

In code, you parse each model output for Action:, execute the tool, paste the result as Observation:, and continue generating. Most agent frameworks (LangChain agents, LlamaIndex) implement this loop for you.

Trade-offs

ReAct is more expensive than plain prompting. Each loop is a model call, and tool calls add their own latency. Five iterations can easily mean a 10-second response. It also produces a lot of tokens you do not show to the user, so cost climbs.

It can also loop forever if the model gets confused or a tool returns garbage. You must enforce max steps, a stop condition, and a fallback.

Compared to native function calling (built into modern model APIs), ReAct is more transparent but less reliable in format. Function calling enforces JSON schemas; ReAct relies on the model to follow text conventions. In production today, most teams use function calling for execution and keep ReAct-style reasoning steps as visible thoughts.

ReAct shines when the problem genuinely requires multiple tool calls in sequence, with each call depending on the previous result. For simple one-shot tool use, plain function calling is enough.

Practical Tips

  • Cap the number of iterations. Six is a reasonable default.
  • Parse strictly. If the model emits malformed Action: lines, return an error observation and let it retry.
  • Include a few-shot example in the prompt showing exactly the format you want.
  • Truncate noisy observations. A 10,000-token search dump kills the next call.
  • Log every Thought-Action-Observation trace. It is the most useful debugging artifact you will ever get.
  • For production, prefer model-native tool calling but keep an internal thought field for visible reasoning.

Wrap-up

ReAct gives an LLM a clean structure for combining reasoning with tool use. The loop of thinking, acting, observing, and revising lets models tackle multi-step problems that go beyond their training data. It is more expensive and brittle than a single call, but for any task that genuinely needs external information or computation, ReAct or its native function-calling descendants are the foundation of modern LLM agents.