Prompt Engineering with Tools and Functions
How to write prompts and tool definitions that make function calling reliable. Covers schemas, descriptions, examples, error handling, and patterns for multi-tool agents.
What you'll learn
- ✓How tools and prompts interact in function calling
- ✓Writing tool descriptions the model can actually use
- ✓Designing schemas for clarity, not just validity
- ✓Handling tool errors without confusing the model
- ✓Scaling to many tools without quality collapse
Prerequisites
- •Basic LLM API usage
- •Familiarity with JSON schema
Tool use is mostly a prompting problem. The provider gives you a mechanism — the model emits a structured call instead of plain text — but whether that mechanism produces useful behavior depends almost entirely on the prompt and the tool definitions. Get those right and tool use feels magical. Get them wrong and the model keeps grabbing the wrong tool, fabricating arguments, or refusing to call anything.
What and why
A tool definition is part prompt. The name, description, and schema are all read by the model on every turn. They occupy context, they shape behavior, and they compete for attention. Writing them is prompt engineering.
The reason this matters is that small wording changes in a tool description can flip the model from calling the right tool every time to picking the wrong one half the time. The same care you put into the system prompt belongs in tool definitions.
Mental model
Imagine the model is staring at a wall of buttons. Each button has a label, a tooltip, and a short list of fields. To pick the right button it reads the labels in order, eliminates the obvious wrong ones, and then looks more carefully at the rest. Your job is to make the right button easy to find.
That means short, distinct names; descriptions that explain when to use the tool (not just what it does); and schemas with field-level descriptions that hint at expected formats. Treat tool definitions like API docs written for a slightly distractible reader.
Hands-on example
Compare two definitions for the same function.
# Vague — model will misuse this
{"name": "search", "description": "Searches things",
"input_schema": {"properties": {"q": {"type": "string"}}}}
# Specific — model can place it correctly
{"name": "search_product_catalog",
"description": "Search the product catalog by free-text query. "
"Use when the user mentions a product, SKU, or category. "
"Do NOT use for order or shipping lookups.",
"input_schema": {
"properties": {
"query": {"type": "string",
"description": "Natural-language search terms. Keep under 10 words."}
},
"required": ["query"]
}}
The control flow around tools looks like this.
user message
|
v
[LLM with tool list]
|
v
emits tool_call?
| \
yes no -> final answer
|
v
[execute tool]
|
v
success? -- yes --> result -> back to LLM
|
no
v
error message -> back to LLM -> retry or different tool When a tool fails, return a structured error to the model — what went wrong, why, and what it could try instead. A bare exception string leaves the model guessing.
Trade-offs
More tools mean more flexibility and lower per-tool quality. Past roughly fifteen tools, models start mixing them up, even strong ones. If you need dozens, group them behind a router tool that picks a sub-toolset based on intent.
Strict schemas reduce hallucinated arguments but constrain what the model can express. If the same call has multiple shapes, consider splitting it into multiple tools with clear names.
Optional fields are useful but invite misuse. Models often fill them with plausible but wrong defaults. Either require fields explicitly or warn against them in the description.
Practical tips
Name tools as verb_object: get_order, update_user_profile, cancel_subscription. Snake_case scans well and avoids the model treating tool names as English.
Put the “when to use” guidance in the description, not the system prompt. The model reads tool descriptions when deciding what to call; system prompt guidance gets diluted across the conversation.
Add an example or two inside the description for fields with non-obvious format (“Use ISO-8601 date, e.g. 2026-06-28”). Schema alone is not enough.
Log every tool call with its arguments and result. The patterns will tell you which definitions need rewording.
Cap tool call loops with a maximum-step counter. Models occasionally enter loops where they call the same tool with slight variations. A counter is cheaper than a smarter prompt.
Wrap-up
Tools extend what a model can do, but the prompt is still the interface. Treat tool definitions as first-class prompt content, write them with the same care, and the function-calling layer of your system becomes one of the most reliable parts of your stack.
Related articles
- AI Function Calling with LLMs: Production Patterns
How function calling really works under the hood, the schema design that survives contact with users, and the failure modes to plan for.
- LLMs LLM Function Schema Best Practices
How to design tool schemas that LLMs actually call correctly, with naming, description, and parameter patterns that survive real users and adversarial inputs.
- Prompt Engineering Prompt Engineering: The ReAct Pattern
Learn the ReAct pattern, a prompting technique that combines reasoning and action to build effective tool-using LLM agents.
- LLMs LLM Tool Calling and Agents Overview
Understand how tool calling lets LLMs invoke functions, why agents loop over tools, and how to design reliable tool schemas.