Prompt Engineering: Output Formatters
How to coax LLMs into producing predictable, parseable output using output formatters, JSON schemas, examples, and validation loops that actually hold up in production code paths.
What you'll learn
- ✓What an output formatter actually is
- ✓When to use JSON, XML, or delimited formats
- ✓How to combine schemas with few-shot examples
- ✓How to validate and retry on bad output
- ✓Trade-offs between strict and loose formats
Prerequisites
- •Basic LLM API usage
The single most common reason LLM-powered features break in production is not hallucination, it is shape drift. The model returned something close to what you asked for, but a key was missing, a list was a string, or a stray markdown fence broke your parser. Output formatters are the discipline of forcing model output into a shape your code can rely on.
What and why
An output formatter is the combination of instructions, schema, examples, and post-processing that converts free-form generation into structured data. It is not a single feature; it is a pattern that uses whatever your provider supports — JSON mode, tool schemas, grammar constraints — together with prompt-level guardrails.
The reason this matters is downstream code. If a function expects a list of objects with a score field, anything else is a bug. Formatters reduce the surface area where the model can surprise you, which means fewer try/except blocks, fewer alert pages, and fewer angry users.
Mental model
Think of the model as a sloppy intern who is smart but does not read the spec carefully. You need three things: a clear contract (the schema), worked examples (so it sees the pattern), and a checker (so mistakes do not slip into production). Skip any one and quality drops.
The contract should be as machine-readable as possible. The examples should cover edge cases — empty lists, optional fields, multilingual strings. The checker should know what to do when validation fails: retry with feedback, fall back to a simpler format, or surface the error.
Hands-on example
Suppose you need to extract product reviews into a structured form. Here is the loop.
schema = {
"type": "object",
"properties": {
"sentiment": {"enum": ["positive", "neutral", "negative"]},
"topics": {"type": "array", "items": {"type": "string"}},
"score": {"type": "number", "minimum": 0, "maximum": 1},
},
"required": ["sentiment", "topics", "score"],
}
prompt = f"""Extract structured data from the review.
Return JSON matching this schema: {schema}
Example: {{"sentiment": "positive", "topics": ["battery"], "score": 0.9}}
Review: {review}"""
The flow looks like this.
raw prompt + schema + examples
|
v
[LLM call]
|
v
parse JSON --> ok? --> yes --> downstream code
| no
v
[validator] -- error msg --> retry prompt
|
v
fallback (escalate or default) The retry loop matters more than the prompt itself. Most failures are transient: a stray comma, an extra field. Feeding the error message back to the model fixes the majority on the second try. Cap retries at two or three to bound cost.
Trade-offs
Strict formats like tool schemas or JSON mode give you near-perfect parseability at the cost of some flexibility. The model has less room to add useful commentary, and exotic schemas can degrade reasoning quality.
Loose formats — markdown, XML tags, delimited blocks — are easier to inspect by humans and play nicely with chain-of-thought. They are harder to parse reliably and need more defensive code.
Free-form output with regex extraction is tempting for one-off scripts but rarely survives contact with real traffic. Use it only when the structure is genuinely simple and you control both ends.
Practical tips
Always show at least one example, even when you provide a schema. Models pattern-match more strongly than they spec-follow. For lists, include an empty case so the model does not invent items to fill space.
Log every parse failure with the raw output. The patterns will guide your next prompt revision. If you see the same failure twice, that is a prompt bug, not bad luck.
When fields are optional, say so explicitly and show an example without that field. Otherwise the model hallucinates plausible-looking values.
For long outputs, prefer streaming with a tolerant parser like a JSON streaming library. It surfaces structure issues earlier and lets you cancel runaway generations.
Wrap-up
Output formatters turn a probabilistic system into something you can build software on top of. The recipe is small: a schema, examples, validation, and a bounded retry. Get those four pieces right and most of the downstream complexity in your LLM stack disappears.
Related articles
- Prompt Engineering Prompt Engineering Anti-Patterns: Mistakes That Quietly Hurt Quality
A field guide to the most common prompt engineering anti-patterns, why they degrade LLM output quality, and concrete refactors that fix each one.
- Prompt Engineering Prompt Engineering: Chain of Thought
Use chain-of-thought prompting to unlock multi-step reasoning, with zero-shot, few-shot, and structured variants for production use.
- Prompt Engineering Prompt Engineering: Evaluation Loops
How to build evaluation loops for prompts so you can iterate with evidence instead of vibes. Covers datasets, graders, regressions, and how to make eval cheap enough to run often.
- Prompt Engineering Prompt Engineering: Few-shot vs Zero-shot
Decide between zero-shot and few-shot prompting by weighing example quality, cost, and how strictly you need to control output format.