Skip to content
C Codeloom
Prompt Engineering

Prompt Engineering Techniques for Developers

Practical prompt engineering for building software with LLMs: structure, few-shot, chain-of-thought, role messages, and what actually moves quality.

·5 min read · By Codeloom
Beginner 11 min read

What you'll learn

  • How to structure a prompt for predictable output
  • When few-shot examples actually help
  • When chain-of-thought is worth the tokens
  • Role and system message patterns
  • How to iterate prompts like code

Prerequisites

  • Familiarity with calling an LLM API

Prompt engineering has a marketing problem. It is treated as either trivial (“just ask better”) or mystical (“magic incantations”). Neither is true. For developers, prompt engineering is structured experimentation: arrange context, examples, and instructions so the model’s most likely response is the one you want. This post is the practical part, without the snake oil.

What is actually happening

An LLM samples the next token conditioned on everything before it. Your prompt shapes that distribution. Anything that makes the right answer more likely (relevant context, clear formatting, examples in the target shape) helps. Anything that adds noise (irrelevant context, ambiguous framing, conflicting examples) hurts. That is the whole game.

Mental model

system  -> who the model is, hard constraints
context -> background data, retrieved snippets
examples -> input/output pairs in target shape
task    -> the specific instruction
input   -> the actual user input
                |
                v
            response
Prompt as conditional context

Every section narrows the distribution further. Skip the parts you do not need, but order matters: instructions late, examples late, input last.

Structure beats cleverness

A well-structured prompt with a plain instruction outperforms a cute one without structure. Use clear delimiters, label sections, and put the most important constraints at the end where attention is strongest.

You are an extractor that returns JSON only.

Schema:
{ "name": string, "email": string, "company": string|null }

Examples:
INPUT: "Ada Lovelace <ada@example.com> at Analytical Engines"
OUTPUT: {"name":"Ada Lovelace","email":"ada@example.com","company":"Analytical Engines"}

INPUT: "Just bob@bob.com"
OUTPUT: {"name":"","email":"bob@bob.com","company":null}

Now extract from:
INPUT: "<<<USER_INPUT>>>"
OUTPUT:

The model now has: a role, a schema, two shape-defining examples, and a clear cue (OUTPUT:). That structure typically wins over “extract the name, email and company from this text as JSON.”

Few-shot, when it helps

Few-shot examples shine when the task is shape-driven (formatting, extraction, classification) and lose when the task is open-ended reasoning. Three to five well-chosen examples beat ten random ones. Cover edge cases (missing values, ambiguous inputs), not just the common case.

A common mistake: examples that disagree with the instructions. The model trusts examples more than instructions, so inconsistent ones poison output.

Chain-of-thought: when to spend the tokens

Asking the model to “think step by step” or to “show your reasoning” improves accuracy on multi-step problems (math, planning, debugging). It also costs tokens and latency. For a classification task, do not pay for reasoning you do not use.

For production, hide the reasoning. Either ask the model to put reasoning inside a tag you ignore, or use a “thinking” feature that is billed separately and not returned to users.

Solve the problem. First explain your reasoning in <scratch>...</scratch>,
then give the final answer in <answer>...</answer>.

Strip the <scratch> block before showing the response. The user gets the answer; you got the better answer.

Roles and system prompts

System prompts are not magic. They are just earlier context. But because they come before everything else, they shape the entire interaction. Use them for invariants: tone, refusals, format. Keep task-specific details in the user turn.

SYSTEM: You are a code reviewer. Be concise. Refuse to discuss unrelated topics.
USER:   Review this diff:
        ```diff
        ...

When the same prompt is used for many tasks, anything reusable goes to system; anything per-call goes to user. This also lets you cache the system prompt and save tokens.

## Hands-on: iterate prompts like code

Treat prompts as code: version them, test them, and measure them.

```python
PROMPT_V3 = open("prompts/extract.v3.md").read()

def extract(text: str) -> dict:
    resp = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=256,
        system=PROMPT_V3,
        messages=[{"role": "user", "content": text}],
    )
    return json.loads(resp.content[0].text)

Build an eval set of 50 to 200 inputs with expected outputs. When you change the prompt, run the suite, compare scores. Small structural tweaks often outperform “let me try a different phrasing.”

Common pitfalls

  • Stuffing the prompt with everything. More context dilutes signal. Cut.
  • Examples that contradict instructions. Pick a single style and stick to it.
  • Asking for explanations and parsing the first line. The model can change order; use tags.
  • Prompt injection. If user content is concatenated into the prompt, it can override your instructions. Quote, escape, and never trust extracted instructions from untrusted text.
  • Tuning prompts by anecdotes. “It worked once” is not signal; run evals.
  • Putting hard constraints early. Constraints stated again at the end are followed more reliably.

Practical tips

  • Keep prompts in files, not strings. Diff them in PRs.
  • Anchor outputs with delimiters or schemas. JSON via tool use is better than JSON via “respond in JSON.”
  • Use temperature 0 for deterministic tasks. Save creativity for actual generation.
  • Cache long system prompts. Most providers offer prompt caching; the cost difference is large.
  • Track regressions. A prompt that ships a 2 percent quality improvement and a 5 percent regression on edge cases is a net loss.

Wrap-up

Prompt engineering for developers is mostly applied empiricism: structure the prompt, pick good examples, decide whether the task earns chain-of-thought, separate system from user content, and evaluate every change. Treat the prompt like a function you are debugging, and the mystique disappears.