LLM Tool Calling and Agents Overview
Understand how tool calling lets LLMs invoke functions, why agents loop over tools, and how to design reliable tool schemas.
What you'll learn
- ✓What tool calling actually is under the hood
- ✓How an agent loop reads, runs, and feeds tool results back
- ✓How to design tool schemas that the model can use reliably
- ✓Common failure modes and how to mitigate them
- ✓When you do not need an agent at all
Prerequisites
- •Familiar with how APIs work
What and Why
A bare LLM can only output text. Tool calling is the mechanism that lets it ask your application to run a function and return the result. With tools, the same model can fetch live data, query a database, call another API, or perform a calculation it would otherwise hallucinate.
An agent is a loop on top of tool calling: the model proposes a tool call, your code executes it, you append the result to the conversation, and the model decides whether to call another tool or produce a final answer.
Mental Model
Tool calling is structured output, not magic. You give the model a JSON schema for each tool. The model decides whether to respond with plain text or with a tool_calls field containing the tool name and arguments. Your code is responsible for actually running the tool.
user message
|
v
LLM -> "tool_call: search(query='weather Tokyo')"
|
v
your code runs search() -> "26C, clear"
|
v
append tool result to messages
|
v
LLM -> "tool_call: convert_units(...)" OR "final: It is 26 degrees in Tokyo."
|
v
(loop until final answer or max steps) The model never executes anything itself. It only describes which tool it wants and with what arguments. This separation is what makes tool calling safe to wire into production systems.
Hands-on Example
Here is a minimal agent loop using the OpenAI-style API.
from openai import OpenAI
import json
client = OpenAI()
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
def get_weather(city):
# Pretend this hits a real API.
return {"city": city, "temp_c": 26, "conditions": "clear"}
def run_agent(user_msg, max_steps=5):
messages = [{"role": "user", "content": user_msg}]
for _ in range(max_steps):
resp = client.chat.completions.create(
model="gpt-4o-mini", messages=messages, tools=tools,
)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
result = get_weather(**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result),
})
return "Max steps reached."
print(run_agent("What's the weather in Tokyo?"))
The loop is the entire idea of an agent. Everything else (planners, memory, multi-agent setups) is an elaboration on this skeleton.
Trade-offs
Agents look powerful but introduce real costs.
- Latency multiplies. Each tool call is another round trip to the model. A five-step agent is at minimum five LLM calls.
- Cost multiplies. Each call resends the entire conversation including prior tool results. Long chains can be very expensive.
- Failure surface grows. The model can hallucinate tool names, pass wrong types, or get stuck in loops calling the same tool.
- Debugging is harder. You now have to inspect a trace of calls, not a single request.
In practice, many “agent” use cases are better served by a fixed workflow that calls tools in a predetermined order. Reach for an open-ended agent only when the branching is genuinely data-dependent.
Practical Tips
A few habits keep tool calling reliable.
- Write tool descriptions for the model, not for humans. State exactly when to call the tool and what the inputs mean. Bad descriptions cause silent misuse.
- Use strict JSON schema. Enable strict mode if your provider supports it so arguments are guaranteed to match the schema. This eliminates a whole class of parse errors.
- Return structured, compact results. Tool outputs become input tokens on the next turn. Trim noise. Return JSON, not paragraphs.
- Cap the loop. Always set a
max_steps. Always log every step. A runaway agent can burn dollars in seconds. - Validate before executing. Even with strict schemas, validate ranges and business rules before you let a tool delete data or send money.
- Prefer many small tools over one giant one. A
search_orders(filters)tool is easier for the model to use correctly than ado_anything(query)tool. - Surface errors back as tool results. If a tool fails, return a JSON error object. The model can often recover by trying different arguments.
Wrap-up
Tool calling is the bridge from text generation to action. Once you can describe a function as a schema, you can extend the model’s reach to anything in your stack. Start with a single tool and a single loop. Add structure as you find real branching in your workflow. Resist the urge to build a generalized agent before you have a concrete problem that demands it.
Related articles
- LLMs LLM Function Schema Best Practices
How to design tool schemas that LLMs actually call correctly, with naming, description, and parameter patterns that survive real users and adversarial inputs.
- AI Function Calling with LLMs: Production Patterns
How function calling really works under the hood, the schema design that survives contact with users, and the failure modes to plan for.
- LLMs LLM Fine-tuning vs Prompting Trade-offs
Decide between prompt engineering, retrieval, and fine-tuning by weighing cost, latency, control, and data requirements honestly.
- LLMs LLM Streaming Responses Tutorial
Stream tokens from an LLM as they are generated to cut perceived latency, handle partial outputs, and build responsive chat UIs.