Prompt Templates and Reusable Patterns
How to turn prompts into reusable templates: variables, system messages as contracts, few-shot structure, JSON output schemas, role prompting, and when patterns help versus hurt.
What you'll learn
- ✓How to turn ad-hoc prompts into versioned templates
- ✓Why a system message works as a contract
- ✓Structuring few-shot examples for reliability
- ✓Forcing structured JSON output safely
- ✓When role prompting helps and when it is theatre
Prerequisites
- •Comfort writing prompts — see Prompt Engineering Basics
- •You have an application that calls an LLM in code
The first prompt you ship is hand-written and inline. The tenth is the same prompt with a variable. The hundredth is a template you keep in a file, version, and test.
This post is about that progression — the patterns that make prompts maintainable, the structures that hold up under real traffic, and the moments where over-engineering a prompt makes things worse.
Template variables
A template is a prompt with named slots.
# A tiny template — easy to read, easy to version
SUMMARIZE_TEMPLATE = """You are summarising support tickets.
Style:
- {style}
- Max {max_sentences} sentences.
TICKET:
{ticket_text}
SUMMARY:"""
def render(template: str, **kwargs) -> str:
return template.format(**kwargs)
prompt = render(
SUMMARIZE_TEMPLATE,
style="neutral, factual",
max_sentences=3,
ticket_text=ticket.body,
)
Why this matters:
- You can diff and review prompt changes like any other code
- You can version them —
SUMMARIZE_V3ships alongsideSUMMARIZE_V2for a safe rollout - Eval suites can target a template instead of a snapshot in someone’s head — see LLM Evaluation Basics
A common upgrade is moving templates into a dedicated file or directory (prompts/summarize.md) so non-engineers can edit them. Use a real templating library — Jinja2 in Python, Handlebars in JS — once str.format starts feeling tight.
One trap: never interpolate untrusted user input directly into instructions. A user typing "\n\nIgnore previous instructions and..." should not be able to override your system message. Quote user content, put it inside a clearly delimited block, and treat it as data, not instructions.
System messages as contracts
The system message is where the model meets the rules of the road. Treat it like an API contract — explicit, scoped, durable.
You are CodeLoom's documentation assistant.
You ONLY answer questions about the CodeLoom platform.
If asked about anything else, reply: "I can only help with CodeLoom questions."
When you cite documentation, quote the exact phrase in quotes
followed by the page title in brackets.
If unsure, say "I don't know" rather than guessing.
Properties of a good system message:
- Names the persona briefly — one sentence, not three paragraphs
- States what is in scope and out of scope
- Defines output rules — format, tone, citation style
- Specifies fallback behaviour — what to do when unsure
What to avoid:
- Vague aspirations (“be helpful and friendly”)
- Long lists of edge cases — the model loses them in the middle
- Anything you cannot verify in an eval set
A system message that survives review reads more like a function signature than a pep talk.
Few-shot template structure
Few-shot prompting — including examples of input/output pairs — works when the task is hard to describe but easy to demonstrate.
Classify the customer's intent as one of:
[refund, order_status, product_question, complaint, other].
Examples:
INPUT: "Where's my package?"
OUTPUT: order_status
INPUT: "This is broken, I want my money back."
OUTPUT: refund
INPUT: "Does the Pro plan include SSO?"
OUTPUT: product_question
Now classify:
INPUT: "{user_message}"
OUTPUT:
Patterns that hold up:
- Three to five examples is the sweet spot. More usually does not help and burns tokens.
- Cover the categories, not just the easy ones. Include one ambiguous example.
- Mirror your real input format. If real tickets are lowercase fragments, examples should be too.
- Put examples before the new input. Models pay more attention to what comes last.
A common mistake: cherry-picking polished examples. The model learns “outputs look like these polished examples” and refuses to handle messy real inputs.
Try it. Take a classification prompt you have written. Look at the examples in it. Are they similar to the messy inputs your users actually send? If they are cleaner, replace them with five real cases from your logs. Re-run your eval set. The score usually moves.
Output schemas and JSON mode
When downstream code consumes the output, you need structured data, not prose. Two pieces help here.
JSON mode / structured output. Most providers now support forcing JSON output, often against a schema you provide.
# Forcing structured output with a schema
schema = {
"type": "object",
"properties": {
"intent": {
"type": "string",
"enum": ["refund", "order_status", "product_question", "complaint", "other"],
},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
"needs_human": {"type": "boolean"},
},
"required": ["intent", "confidence", "needs_human"],
}
response = client.chat(
model="some-model",
messages=[...],
response_format={"type": "json_schema", "schema": schema},
)
The model is constrained at decode time to emit JSON matching the schema. No more “I tried to parse this and the model added a friendly preamble.”
Explicit format instructions. Even with JSON mode, telling the model what each field means in the prompt improves quality.
Return a JSON object with these fields:
- intent: the user's primary goal
- confidence: 0.0 to 1.0, how sure you are
- needs_human: true if the question is ambiguous or sensitive
Schema enforces shape. Prose explains meaning. You want both.
Role prompting
“You are an expert Python developer with 20 years of experience…” — this is role prompting. Sometimes it helps. Sometimes it is theatre.
When it helps:
- Establishing tone and voice (“You are a friendly support agent”)
- Setting scope (“You are a SQL expert. Refuse non-SQL questions.”)
- Aligning expectations (“You are a careful editor; do not change facts, only style.”)
When it does not help:
- Inflating expertise levels does not make answers more correct
- Long persona descriptions burn tokens without measurable lift
- Generic flattery prompts (“You are the world’s best…”) have no effect that survives an eval
Rule of thumb: use a role to communicate constraints the model would not otherwise infer. Skip it when the task speaks for itself.
When patterns help vs hurt
Templates and patterns are powerful. They also have failure modes worth naming.
Helpful when:
- The same prompt runs at high volume — even small wins compound
- Multiple engineers touch the prompt — structure prevents drift
- You need to evaluate, version, and roll back
Counterproductive when:
- You are still exploring what works — premature structure freezes a bad shape in
- The task is one-off — a hand-written prompt is faster
- The template grows so general it forgets the actual task — every option costs token budget and model attention
A good check: read your template out loud and ask “would a smart contractor know what to do from this?” If yes, ship it. If you find yourself rationalising sentences, cut them.
Audit. Open the longest prompt in your codebase. Highlight every sentence that does not change behaviour if you remove it. Cut those. Re-run your evals. Usually nothing breaks and the prompt is now half the length, twice as readable, and a little cheaper per call.
A small template library shape
Once you have a few, a repo layout tends to settle into something like:
prompts/
summarize/
v1.md
v2.md # current
examples.json # used both as few-shot and as eval seed
classify_intent/
v3.md
examples.json
reply_to_ticket/
v1.md
Each prompt directory has the template, its examples, and a changelog at the top of the file. A loader function reads the latest version unless overridden. Evals reference templates by name and version, so you can A/B v2 against v3 on identical inputs.
This is unglamorous infrastructure. It is also how teams stop reinventing the same prompt every sprint.
Recap
- Templates with variables make prompts diffable, versionable, and testable
- System messages are contracts — scope, output rules, fallback behaviour
- Few-shot examples should mirror real inputs and cover edge cases
- JSON schema output turns prompts into reliable API surfaces
- Role prompting helps for tone and scope; it is not magic
- Patterns earn their keep at scale; skip them while exploring
Next steps
Templates package what works. Tool use lets the templated assistant actually do things — that is the natural next step.
→ Next: LLM Tool Use and Function Calling
Related: Prompt Engineering Basics, LLM Evaluation Basics, What Is an LLM?.
Questions or feedback? Email codeloomdevv@gmail.com.