Prompt Templates and Reusable Patterns

Intermediate 11 min read

What you'll learn

✓How to turn ad-hoc prompts into versioned templates
✓Why a system message works as a contract
✓Structuring few-shot examples for reliability
✓Forcing structured JSON output safely
✓When role prompting helps and when it is theatre

Prerequisites

•Comfort writing prompts — see Prompt Engineering Basics
•You have an application that calls an LLM in code

The first prompt you ship is hand-written and inline. The tenth is the same prompt with a variable. The hundredth is a template you keep in a file, version, and test.

This post is about that progression — the patterns that make prompts maintainable, the structures that hold up under real traffic, and the moments where over-engineering a prompt makes things worse.

Template variables

A template is a prompt with named slots.

# A tiny template — easy to read, easy to version
SUMMARIZE_TEMPLATE = """You are summarising support tickets.

Style:
- {style}
- Max {max_sentences} sentences.

TICKET:
{ticket_text}

SUMMARY:"""

def render(template: str, **kwargs) -> str:
    return template.format(**kwargs)

prompt = render(
    SUMMARIZE_TEMPLATE,
    style="neutral, factual",
    max_sentences=3,
    ticket_text=ticket.body,
)

Why this matters:

You can diff and review prompt changes like any other code
You can version them — SUMMARIZE_V3 ships alongside SUMMARIZE_V2 for a safe rollout
Eval suites can target a template instead of a snapshot in someone’s head — see LLM Evaluation Basics

A common upgrade is moving templates into a dedicated file or directory (prompts/summarize.md) so non-engineers can edit them. Use a real templating library — Jinja2 in Python, Handlebars in JS — once str.format starts feeling tight.

One trap: never interpolate untrusted user input directly into instructions. A user typing "\n\nIgnore previous instructions and..." should not be able to override your system message. Quote user content, put it inside a clearly delimited block, and treat it as data, not instructions.

System messages as contracts

The system message is where the model meets the rules of the road. Treat it like an API contract — explicit, scoped, durable.

You are CodeLoom's documentation assistant.

You ONLY answer questions about the CodeLoom platform.
If asked about anything else, reply: "I can only help with CodeLoom questions."

When you cite documentation, quote the exact phrase in quotes
followed by the page title in brackets.

If unsure, say "I don't know" rather than guessing.

Properties of a good system message:

Names the persona briefly — one sentence, not three paragraphs
States what is in scope and out of scope
Defines output rules — format, tone, citation style
Specifies fallback behaviour — what to do when unsure

What to avoid:

Vague aspirations (“be helpful and friendly”)
Long lists of edge cases — the model loses them in the middle
Anything you cannot verify in an eval set

A system message that survives review reads more like a function signature than a pep talk.

Few-shot template structure

Few-shot prompting — including examples of input/output pairs — works when the task is hard to describe but easy to demonstrate.

Classify the customer's intent as one of:
[refund, order_status, product_question, complaint, other].

Examples:
INPUT: "Where's my package?"
OUTPUT: order_status

INPUT: "This is broken, I want my money back."
OUTPUT: refund

INPUT: "Does the Pro plan include SSO?"
OUTPUT: product_question

Now classify:
INPUT: "{user_message}"
OUTPUT:

Patterns that hold up:

Three to five examples is the sweet spot. More usually does not help and burns tokens.
Cover the categories, not just the easy ones. Include one ambiguous example.
Mirror your real input format. If real tickets are lowercase fragments, examples should be too.
Put examples before the new input. Models pay more attention to what comes last.

A common mistake: cherry-picking polished examples. The model learns “outputs look like these polished examples” and refuses to handle messy real inputs.

Try it. Take a classification prompt you have written. Look at the examples in it. Are they similar to the messy inputs your users actually send? If they are cleaner, replace them with five real cases from your logs. Re-run your eval set. The score usually moves.

Output schemas and JSON mode

When downstream code consumes the output, you need structured data, not prose. Two pieces help here.

JSON mode / structured output. Most providers now support forcing JSON output, often against a schema you provide.

# Forcing structured output with a schema
schema = {
    "type": "object",
    "properties": {
        "intent": {
            "type": "string",
            "enum": ["refund", "order_status", "product_question", "complaint", "other"],
        },
        "confidence": {"type": "number", "minimum": 0, "maximum": 1},
        "needs_human": {"type": "boolean"},
    },
    "required": ["intent", "confidence", "needs_human"],
}

response = client.chat(
    model="some-model",
    messages=[...],
    response_format={"type": "json_schema", "schema": schema},
)

The model is constrained at decode time to emit JSON matching the schema. No more “I tried to parse this and the model added a friendly preamble.”

Explicit format instructions. Even with JSON mode, telling the model what each field means in the prompt improves quality.

Return a JSON object with these fields:
- intent: the user's primary goal
- confidence: 0.0 to 1.0, how sure you are
- needs_human: true if the question is ambiguous or sensitive

Schema enforces shape. Prose explains meaning. You want both.

Role prompting

“You are an expert Python developer with 20 years of experience…” — this is role prompting. Sometimes it helps. Sometimes it is theatre.

When it helps:

Establishing tone and voice (“You are a friendly support agent”)
Setting scope (“You are a SQL expert. Refuse non-SQL questions.”)
Aligning expectations (“You are a careful editor; do not change facts, only style.”)

When it does not help:

Inflating expertise levels does not make answers more correct
Long persona descriptions burn tokens without measurable lift
Generic flattery prompts (“You are the world’s best…”) have no effect that survives an eval

Rule of thumb: use a role to communicate constraints the model would not otherwise infer. Skip it when the task speaks for itself.

When patterns help vs hurt

Templates and patterns are powerful. They also have failure modes worth naming.

Helpful when:

The same prompt runs at high volume — even small wins compound
Multiple engineers touch the prompt — structure prevents drift
You need to evaluate, version, and roll back

Counterproductive when:

You are still exploring what works — premature structure freezes a bad shape in
The task is one-off — a hand-written prompt is faster
The template grows so general it forgets the actual task — every option costs token budget and model attention

A good check: read your template out loud and ask “would a smart contractor know what to do from this?” If yes, ship it. If you find yourself rationalising sentences, cut them.

Audit. Open the longest prompt in your codebase. Highlight every sentence that does not change behaviour if you remove it. Cut those. Re-run your evals. Usually nothing breaks and the prompt is now half the length, twice as readable, and a little cheaper per call.

A small template library shape

Once you have a few, a repo layout tends to settle into something like:

prompts/
  summarize/
    v1.md
    v2.md          # current
    examples.json  # used both as few-shot and as eval seed
  classify_intent/
    v3.md
    examples.json
  reply_to_ticket/
    v1.md

Each prompt directory has the template, its examples, and a changelog at the top of the file. A loader function reads the latest version unless overridden. Evals reference templates by name and version, so you can A/B v2 against v3 on identical inputs.

This is unglamorous infrastructure. It is also how teams stop reinventing the same prompt every sprint.

Recap

Templates with variables make prompts diffable, versionable, and testable
System messages are contracts — scope, output rules, fallback behaviour
Few-shot examples should mirror real inputs and cover edge cases
JSON schema output turns prompts into reliable API surfaces
Role prompting helps for tone and scope; it is not magic
Patterns earn their keep at scale; skip them while exploring

Next steps

Templates package what works. Tool use lets the templated assistant actually do things — that is the natural next step.

→ Next: LLM Tool Use and Function Calling

Questions or feedback? Email codeloomdevv@gmail.com.