AI Agents and Tool Use Patterns

Beginner 11 min read

What you'll learn

✓How tool use actually works under the hood
✓The basic agent loop and its failure modes
✓Patterns for planning and parallel tool calls
✓How to handle errors and retries safely
✓How to keep agents bounded and observable

Prerequisites

•Basic Python familiarity

An AI agent is a loop: the model reads context, decides whether to call a tool, gets the result, and continues until it has an answer. That description fits everything from a weekend hobby project to a production system handling thousands of tasks per minute. The difference is in the patterns. This post covers the ones that hold up in practice.

What tool use really is

Tool use does not give the model magic powers. The provider exposes a feature where the model can emit a structured call (a name plus arguments) instead of a normal text response. Your code sees that call, executes the underlying function, and feeds the result back in the next turn. The model then either makes more calls or produces a final answer.

The model never executes anything itself. You define the tools, you decide which ones it can use, you run them, and you control what happens when they fail. Treat each tool definition as an API contract: clear name, clear description, strict argument schema.

from anthropic import Anthropic

client = Anthropic()

tools = [{
    "name": "get_order",
    "description": "Look up an order by ID. Returns status, items, and total.",
    "input_schema": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What is the status of order A-1029?"}],
)
print(resp.stop_reason, resp.content)

The basic agent loop

The loop is short: send messages with tools, check the stop reason, run any tool calls, append results, repeat until the model returns a normal text response or you hit a step cap.

def run_agent(messages, tools, run_tool, max_steps=10):
    for _ in range(max_steps):
        resp = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return resp
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                output = run_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(output),
                })
        messages.append({"role": "user", "content": tool_results})
    raise RuntimeError("step cap exceeded")

The two things to always include are a step cap and a way to surface the final answer. Without a cap, a confused agent loops until your budget runs out. Without a clear final answer path, you do not know when to return.

Tool design that helps the model

Tools should be small, focused, and well-named. A vague tool called do_thing invites misuse; a precise search_orders_by_customer_email makes the model’s life easy. Descriptions should explain what the tool does, what it returns, and when to use it (and not use it).

Schema strictness matters. Make required fields required. Use enums for known categories. Constrain string formats when you can. The fewer ways the model has to call a tool wrong, the fewer retries you will need.

Return structured results, not free-form prose. JSON-like strings parse cleanly into follow-up reasoning. Include error fields when something fails so the model can decide whether to retry or give up.

Parallel tool calls

Modern models can emit multiple tool calls in a single turn when the calls are independent. If the user asks “compare orders A-1029 and A-1030”, a good agent issues both get_order calls in parallel rather than serially. Your loop should detect this, run them concurrently (with a thread pool, asyncio, or your background queue), and pass back all results in one user turn.

The latency win is real, especially when each tool has a non-trivial backend call. Even a two-call parallelization typically halves wall-clock time on multi-step tasks.

Plan-then-act for complex tasks

For tasks that need many steps, ask the model to plan first. A planning step produces a short list of subtasks; an execution step calls tools to do each one. This pattern is cleaner to debug than a single tangled loop, and it gives you a natural place to checkpoint progress and recover from failures.

The plan does not need to be rigid. A good agent revises its plan based on intermediate results. The point is to give the model a chance to think structurally before diving into tool calls.

Error handling without panic

Tools fail. Networks blip, services return 500s, inputs are malformed. The agent should see the error, decide whether to retry with different arguments, fall back to a different tool, or report failure to the user.

Wrap tool execution in your own error handling. Catch exceptions and return a structured error object as the tool result instead of crashing the loop. The model can then read “error: invalid order_id format” and try again with a different value, which is exactly what you want.

Set per-tool timeouts and a per-task budget. An agent that takes 90 seconds is usually broken, even if it eventually returns the right answer.

Observability is non-optional

Every tool call, every intermediate message, every plan revision should be logged. Tracing tools like LangSmith, Phoenix, and Langfuse make this visible as a tree you can inspect. Without traces, debugging a flaky agent is guesswork.

Capture the inputs, the outputs, the latencies, the token counts. When something goes wrong in production, you want to replay the exact sequence the agent saw, not reconstruct it from memory.

Keeping agents bounded

Limit what tools the agent can access. Limit how many steps it can take. Limit how much it can spend on tokens. Limit which users can invoke high-impact tools. The most expensive agent failures usually come from too few limits, not too many.

A pragmatic agent has a small, clear toolbox, a tight step cap, a per-task budget, complete tracing, and a written plan it can revise as it goes. That set of patterns turns an agent from a fragile demo into something you can trust in production.