Structured Outputs with LLMs
How to get reliable JSON out of LLMs using tool use, JSON mode, and grammar-constrained decoding, with patterns that work in production.
What you'll learn
- ✓Why free-text LLM outputs are unreliable for code
- ✓How JSON mode and tool use enforce structure
- ✓How to validate with schemas
- ✓Patterns for retries and partial parses
- ✓Pitfalls and cost considerations
Prerequisites
- •You have called an LLM API once
If you have ever asked an LLM “respond in JSON” and watched it return JSON wrapped in Here is the response: plus a stray trailing comma, you already know why structured outputs exist. Free text is not a data format. The fix is to either constrain the model to a schema, route the response through tool use, or both. This post is about the patterns that actually hold up.
The problem in one paragraph
LLMs sample tokens from a probability distribution. Even when they “know” the format, a small amount of probability mass can drift into prose, code fences, or near-JSON. Your downstream parser does not care that 99 percent of the response was valid; one bad token breaks the run. Structured outputs are about removing that one percent.
Mental model
Free text:
prompt --> model --> "Sure! { name: 'Ada', age: 36 }" -> parser fails
Tool use / JSON schema:
prompt + schema --> model constrained at decode time
--> {"name": "Ada", "age": 36} -> parser ok The provider either constrains decoding (impossible tokens get zero probability) or validates the output before returning, depending on the API.
Hands-on: tool use as a typed return
The most reliable pattern across providers is tool use. You define a “tool” whose only job is to receive the answer, and the model invokes it instead of writing prose.
from anthropic import Anthropic
client = Anthropic()
extract = {
"name": "save_person",
"description": "Save the extracted person to the database.",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0},
"email": {"type": "string", "format": "email"},
},
"required": ["name", "email"],
},
}
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=512,
tools=[extract],
tool_choice={"type": "tool", "name": "save_person"},
messages=[{"role": "user", "content": "Extract: Ada Lovelace, ada@example.com, 36"}],
)
person = next(b for b in resp.content if b.type == "tool_use").input
print(person) # {'name': 'Ada Lovelace', 'age': 36, 'email': 'ada@example.com'}
tool_choice forces the model to call this specific tool. There is no prose to parse, no markdown to strip, and the schema is checked.
Hands-on: JSON mode and JSON schema
Many providers also offer a “respond as JSON” mode, optionally with a JSON schema. With OpenAI:
from openai import OpenAI
client = OpenAI()
schema = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}, "maxItems": 5},
},
"required": ["summary", "tags"],
"additionalProperties": False,
}
resp = client.responses.create(
model="gpt-x",
input="Summarize: ...",
response_format={"type": "json_schema", "json_schema": {"name": "Summary", "schema": schema, "strict": True}},
)
In strict mode the decoder is constrained to produce valid JSON matching the schema. The output is a string you can JSON.parse with confidence.
Validate even when the API says it is valid
Provider-side schemas check JSON shape but not your business rules. Always run the result through your own validator (Pydantic, Zod, JSON Schema) before using it.
from pydantic import BaseModel, EmailStr, conint
class Person(BaseModel):
name: str
age: conint(ge=0, le=150) | None = None
email: EmailStr
person = Person.model_validate(person) # raises on invalid business rules
Retries and partial parses
When the model legitimately fails to comply, send the validation error back as a turn and ask for a fix. This loop succeeds far more often than re-prompting blindly.
for _ in range(3):
try:
result = call_model(messages)
return Person.model_validate(result)
except ValidationError as e:
messages.append({"role": "user", "content": f"Validation failed: {e}. Return only valid JSON."})
raise RuntimeError("model could not produce valid output")
Streaming partial JSON is also tractable: parse incrementally with a streaming JSON parser to render parts of the result as they arrive. Useful for UX, not for correctness.
Common pitfalls
- Asking for JSON in the prompt and not the API. Prompts shift behavior, but only API-level constraints guarantee it.
- Schemas with
anyOfor open enums. Models drift on ambiguous schemas; keep types narrow and required fields explicit. - Letting the model choose the schema. If you offer ten optional fields, expect inconsistent results across calls. Fewer fields, more required.
- Using JSON mode for nested rich text. JSON is a transport; markdown belongs inside string fields, not as nested structures.
- Skipping validation because “the API says strict.” Strict checks JSON Schema only. Email format, foreign keys, business rules: still your job.
- Forgetting cost. Tool use and JSON-mode responses count tokens like anything else; verbose schemas inflate prompt size.
Practical tips
- Prefer tool use when you have a single target shape and want strong constraints. Prefer JSON schema mode when you want a typed response with no tool call ceremony.
- Keep schemas under ~30 fields per call. Big schemas degrade quality; split into stages.
- Add
additionalProperties: false. Otherwise models invent fields and you silently swallow them. - Log the raw response alongside the parsed one. When parsing fails you will want to see exactly what the model produced.
- Use
temperature=0for extraction tasks. Determinism matters more than creativity.
Wrap-up
Treat the LLM as a function with a typed return: declare the type, constrain at decode time, validate after, and retry with the error message when validation fails. Tool use and JSON schemas turn LLM calls from prose generators into actual functions, which is the only way most production pipelines should be calling them.
Related articles
- LLMs LLM Function Schema Best Practices
How to design tool schemas that LLMs actually call correctly, with naming, description, and parameter patterns that survive real users and adversarial inputs.
- LLMs LLM Output Parsing and Validation
Practical techniques for parsing and validating LLM outputs reliably, covering JSON mode, schema enforcement, retries, and repair strategies for production use.
- AI Function Calling with LLMs: Production Patterns
How function calling really works under the hood, the schema design that survives contact with users, and the failure modes to plan for.
- Prompt Engineering Prompt Engineering: Output Formatters
How to coax LLMs into producing predictable, parseable output using output formatters, JSON schemas, examples, and validation loops that actually hold up in production code paths.