Prompt Engineering: Output Formatters

Intermediate 9 min read

What you'll learn

✓What an output formatter actually is
✓When to use JSON, XML, or delimited formats
✓How to combine schemas with few-shot examples
✓How to validate and retry on bad output
✓Trade-offs between strict and loose formats

Prerequisites

•Basic LLM API usage

The single most common reason LLM-powered features break in production is not hallucination, it is shape drift. The model returned something close to what you asked for, but a key was missing, a list was a string, or a stray markdown fence broke your parser. Output formatters are the discipline of forcing model output into a shape your code can rely on.

What and why

An output formatter is the combination of instructions, schema, examples, and post-processing that converts free-form generation into structured data. It is not a single feature; it is a pattern that uses whatever your provider supports — JSON mode, tool schemas, grammar constraints — together with prompt-level guardrails.

The reason this matters is downstream code. If a function expects a list of objects with a score field, anything else is a bug. Formatters reduce the surface area where the model can surprise you, which means fewer try/except blocks, fewer alert pages, and fewer angry users.

Mental model

Think of the model as a sloppy intern who is smart but does not read the spec carefully. You need three things: a clear contract (the schema), worked examples (so it sees the pattern), and a checker (so mistakes do not slip into production). Skip any one and quality drops.

The contract should be as machine-readable as possible. The examples should cover edge cases — empty lists, optional fields, multilingual strings. The checker should know what to do when validation fails: retry with feedback, fall back to a simpler format, or surface the error.

Hands-on example

Suppose you need to extract product reviews into a structured form. Here is the loop.

schema = {
    "type": "object",
    "properties": {
        "sentiment": {"enum": ["positive", "neutral", "negative"]},
        "topics": {"type": "array", "items": {"type": "string"}},
        "score": {"type": "number", "minimum": 0, "maximum": 1},
    },
    "required": ["sentiment", "topics", "score"],
}

prompt = f"""Extract structured data from the review.
Return JSON matching this schema: {schema}
Example: {{"sentiment": "positive", "topics": ["battery"], "score": 0.9}}
Review: {review}"""

The flow looks like this.

raw prompt + schema + examples
 |
 v
[LLM call]
 |
 v
parse JSON --> ok? --> yes --> downstream code
 |                    no
 v
[validator] -- error msg --> retry prompt
 |
 v
fallback (escalate or default)

Output formatter loop with validation and retry

The retry loop matters more than the prompt itself. Most failures are transient: a stray comma, an extra field. Feeding the error message back to the model fixes the majority on the second try. Cap retries at two or three to bound cost.

Trade-offs

Strict formats like tool schemas or JSON mode give you near-perfect parseability at the cost of some flexibility. The model has less room to add useful commentary, and exotic schemas can degrade reasoning quality.

Loose formats — markdown, XML tags, delimited blocks — are easier to inspect by humans and play nicely with chain-of-thought. They are harder to parse reliably and need more defensive code.

Free-form output with regex extraction is tempting for one-off scripts but rarely survives contact with real traffic. Use it only when the structure is genuinely simple and you control both ends.

Practical tips

Always show at least one example, even when you provide a schema. Models pattern-match more strongly than they spec-follow. For lists, include an empty case so the model does not invent items to fill space.

Log every parse failure with the raw output. The patterns will guide your next prompt revision. If you see the same failure twice, that is a prompt bug, not bad luck.

When fields are optional, say so explicitly and show an example without that field. Otherwise the model hallucinates plausible-looking values.

For long outputs, prefer streaming with a tolerant parser like a JSON streaming library. It surfaces structure issues earlier and lets you cancel runaway generations.

Wrap-up

Output formatters turn a probabilistic system into something you can build software on top of. The recipe is small: a schema, examples, validation, and a bounded retry. Get those four pieces right and most of the downstream complexity in your LLM stack disappears.