Prompt Engineering with Tools and Functions

Intermediate 10 min read

What you'll learn

✓How tools and prompts interact in function calling
✓Writing tool descriptions the model can actually use
✓Designing schemas for clarity, not just validity
✓Handling tool errors without confusing the model
✓Scaling to many tools without quality collapse

Prerequisites

•Basic LLM API usage
•Familiarity with JSON schema

Tool use is mostly a prompting problem. The provider gives you a mechanism — the model emits a structured call instead of plain text — but whether that mechanism produces useful behavior depends almost entirely on the prompt and the tool definitions. Get those right and tool use feels magical. Get them wrong and the model keeps grabbing the wrong tool, fabricating arguments, or refusing to call anything.

What and why

A tool definition is part prompt. The name, description, and schema are all read by the model on every turn. They occupy context, they shape behavior, and they compete for attention. Writing them is prompt engineering.

The reason this matters is that small wording changes in a tool description can flip the model from calling the right tool every time to picking the wrong one half the time. The same care you put into the system prompt belongs in tool definitions.

Mental model

Imagine the model is staring at a wall of buttons. Each button has a label, a tooltip, and a short list of fields. To pick the right button it reads the labels in order, eliminates the obvious wrong ones, and then looks more carefully at the rest. Your job is to make the right button easy to find.

That means short, distinct names; descriptions that explain when to use the tool (not just what it does); and schemas with field-level descriptions that hint at expected formats. Treat tool definitions like API docs written for a slightly distractible reader.

Hands-on example

Compare two definitions for the same function.

# Vague — model will misuse this
{"name": "search", "description": "Searches things",
 "input_schema": {"properties": {"q": {"type": "string"}}}}

# Specific — model can place it correctly
{"name": "search_product_catalog",
 "description": "Search the product catalog by free-text query. "
                "Use when the user mentions a product, SKU, or category. "
                "Do NOT use for order or shipping lookups.",
 "input_schema": {
     "properties": {
         "query": {"type": "string",
                   "description": "Natural-language search terms. Keep under 10 words."}
     },
     "required": ["query"]
 }}

The control flow around tools looks like this.

user message
 |
 v
[LLM with tool list]
 |
 v
emits tool_call?
 |          \
 yes         no -> final answer
 |
 v
[execute tool]
 |
 v
success? -- yes --> result -> back to LLM
 |
 no
 v
error message -> back to LLM -> retry or different tool

Tool calling loop with error feedback

When a tool fails, return a structured error to the model — what went wrong, why, and what it could try instead. A bare exception string leaves the model guessing.

Trade-offs

More tools mean more flexibility and lower per-tool quality. Past roughly fifteen tools, models start mixing them up, even strong ones. If you need dozens, group them behind a router tool that picks a sub-toolset based on intent.

Strict schemas reduce hallucinated arguments but constrain what the model can express. If the same call has multiple shapes, consider splitting it into multiple tools with clear names.

Optional fields are useful but invite misuse. Models often fill them with plausible but wrong defaults. Either require fields explicitly or warn against them in the description.

Practical tips

Name tools as verb_object: get_order, update_user_profile, cancel_subscription. Snake_case scans well and avoids the model treating tool names as English.

Put the “when to use” guidance in the description, not the system prompt. The model reads tool descriptions when deciding what to call; system prompt guidance gets diluted across the conversation.

Add an example or two inside the description for fields with non-obvious format (“Use ISO-8601 date, e.g. 2026-06-28”). Schema alone is not enough.

Log every tool call with its arguments and result. The patterns will tell you which definitions need rewording.

Cap tool call loops with a maximum-step counter. Models occasionally enter loops where they call the same tool with slight variations. A counter is cheaper than a smarter prompt.

Wrap-up

Tools extend what a model can do, but the prompt is still the interface. Treat tool definitions as first-class prompt content, write them with the same care, and the function-calling layer of your system becomes one of the most reliable parts of your stack.