LLM Output Parsing and Validation
Practical techniques for parsing and validating LLM outputs reliably, covering JSON mode, schema enforcement, retries, and repair strategies for production use.
What you'll learn
- ✓Why parsing LLM outputs is hard
- ✓JSON mode and structured outputs
- ✓Schema validation patterns
- ✓Repair and retry loops
- ✓How to detect partial successes
Prerequisites
- •Familiar with APIs
Models that produce free text are wonderful for chat and terrible for downstream code. The moment you need to extract a field, store a row, or branch on a value, you need structured output you can trust. This post lays out the techniques that make that reliable.
What output parsing really is
Output parsing turns a model’s text response into a data structure your code can use. The naive approach is to ask for JSON in the prompt and call json.loads. That works most of the time and fails in unpleasant ways when it does not: trailing commentary, missing keys, wrong types, unescaped quotes.
Reliable parsing has three layers. Generation control nudges the model toward valid output. Validation checks that what came back matches the schema. Repair and retry handle the remaining errors.
Mental model
Think of each LLM call as an API client that occasionally returns malformed responses. Your job is to put a thin parsing layer in front that catches the rough edges before the rest of the system sees them.
prompt + schema
|
v
LLM call -> raw text
|
v
parse + validate
| fails
v
repair (retry with error)
|
v
final typed object Hands-on example
A robust pattern with Pydantic and Anthropic.
from pydantic import BaseModel, ValidationError
from anthropic import Anthropic
import json
class Ticket(BaseModel):
title: str
priority: int
tags: list[str]
client = Anthropic()
schema = Ticket.model_json_schema()
def extract(user_text: str, attempts: int = 2) -> Ticket:
err = None
for _ in range(attempts):
prompt = f"Return JSON matching this schema:\n{schema}\nText: {user_text}"
if err:
prompt += f"\nFix this error: {err}"
resp = client.messages.create(
model="claude-opus-4-7", max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
try:
return Ticket(**json.loads(resp.content[0].text))
except (json.JSONDecodeError, ValidationError) as e:
err = str(e)
raise RuntimeError(f"failed after {attempts}: {err}")
Better yet, use the provider’s structured outputs feature. OpenAI’s response_format=json_schema and Anthropic’s tool-use trick (call a tool whose input schema is your data shape) both constrain decoding at the model level so invalid outputs become impossible.
Libraries like Instructor wrap this pattern around major providers. You declare a Pydantic model and get typed results back, with automatic retries on validation errors.
Trade-offs
Free-text JSON mode is the easiest but least reliable. Provider-supplied structured output mode is more reliable but ties you to that vendor’s feature set.
Strict schemas catch errors early but reject borderline-valid responses. A field declared as int that gets the string “3” might be salvageable; deciding whether to fail or coerce is a design choice.
Repair loops cost extra calls. Each retry doubles the cost on failures. Set a tight retry budget and log every failure for offline analysis.
Tool-use as parsing trick gives schema enforcement for free on providers that support it. The downside is that you cannot stream the result easily, since the entire tool input must arrive before validation.
Practical tips
Always validate, never trust raw output. Even with structured output mode, build a Pydantic or Zod model and run it. Future model changes can shift behavior subtly.
Log the raw response alongside the parsed object. When something looks off downstream, you want the original string available without having to reproduce the call.
Surface specific error messages on retry. Telling the model “your JSON had a trailing comma after the tags array” is much more effective than “invalid JSON, try again.”
Prefer enums over free text for categorical fields. If priority must be one of low, medium, high, encode that in the schema. The model gets fewer chances to invent values.
Separate parsing from interpretation. The parser turns text into a typed object. The next layer decides what to do with that object. Mixing them makes both harder to test.
Wrap-up
Reliable LLM output parsing is a layered job: constrain generation, validate hard, repair softly. Use provider-supplied structured output features where you can, validate with a real schema library, and keep retries cheap and bounded. Once this layer is solid, downstream code can pretend it is calling a normal API.
Related articles
- LLMs Structured Outputs with LLMs
How to get reliable JSON out of LLMs using tool use, JSON mode, and grammar-constrained decoding, with patterns that work in production.
- Prompt Engineering Prompt Engineering: Output Formatters
How to coax LLMs into producing predictable, parseable output using output formatters, JSON schemas, examples, and validation loops that actually hold up in production code paths.
- LLMs LLM Cost Tracking in Production
A practical guide to attributing, monitoring, and controlling LLM spend per user, per feature, and per request without slowing down delivery.
- LLMs LLM Fine-tuning vs Prompting Trade-offs
Decide between prompt engineering, retrieval, and fine-tuning by weighing cost, latency, control, and data requirements honestly.