Prompt Engineering Basics: What Actually Works

Beginner 11 min read

What you'll learn

✓The difference between system and user prompts, and what each is for
✓Why clarity beats cleverness almost every time
✓How to use few-shot examples to shape output style and structure
✓How to ask for structured JSON output that your code can parse
✓When chain-of-thought helps, and when it just burns tokens
✓Why eval-driven prompting matters more than any single trick

Prerequisites

•What Is an LLM? covers the mental model you need first

Prompt engineering has a bad reputation — partly because of the dubious courses sold around it, partly because the term sounds grand for something that is mostly “writing clearly to a machine.” This post strips the topic down to what actually moves results in production systems.

If you haven’t yet, read What Is an LLM? for the mechanics that make these techniques work.

System vs user prompts

Modern chat APIs distinguish between roles:

The system prompt sets persistent behaviour — who the assistant is, what it must and must not do, the output format.
User messages carry the actual request or input data.
Assistant messages are the model’s previous responses.

A typical shape:

messages = [
    {"role": "system", "content": "You are a careful technical editor. "
                                  "You return suggestions as a bulleted list."},
    {"role": "user", "content": "Edit this paragraph: ..."},
]

The system prompt is the right place for rules that apply to every turn (“always respond in JSON”, “never invent citations”). The user message is where the variable input lives. Keep them separate; it makes both more maintainable.

Clarity beats cleverness

The biggest single improvement most people can make to their prompts is to write them like a brief to a competent contractor, not like a spell.

A bad prompt:

summarise this

A better one:

Summarise the article below in three bullet points.
Each bullet should be one sentence.
Focus on what is new — skip background context.

Article:
"""
<text here>
"""

The improvements are unglamorous:

State the format. “Three bullets, one sentence each.”
State the focus. “What is new.”
Delimit the input clearly so the model knows where instructions end and content begins.

Triple quotes, XML-like tags (<article>...</article>), or fenced code blocks all work as delimiters. Consistency matters more than which you pick.

Few-shot examples

When you want a specific style, structure, or judgement call, show the model a few examples of input and desired output. This is called few-shot prompting.

prompt = """
Classify each support ticket as: bug, feature, question, or spam.

Examples:
Ticket: "App crashes when I click the export button."
Label: bug

Ticket: "Could you add dark mode?"
Label: feature

Ticket: "How do I reset my password?"
Label: question

Ticket: "BUY CHEAP WATCHES NOW"
Label: spam

Now classify:
Ticket: "The CSV download is missing the last row."
Label:
"""
# output: bug

Two or three examples often outperform a paragraph of description, because the model is excellent at pattern continuation. Pick examples that cover the tricky edges, not the obvious cases.

A caution: every example costs tokens on every call. Once you’ve validated that few-shot works, consider whether the same behaviour can be baked into a clearer instruction.

Structured output (JSON)

When the LLM’s output is being read by code, free-form text is your enemy. Ask for JSON, and specify the schema.

prompt = """
Extract the following fields from the email below.
Respond with a single JSON object and no other text.

Schema:
{
  "sender_name": string,
  "intent": "question" | "complaint" | "request" | "other",
  "urgency": 1 | 2 | 3
}

Email:
\"\"\"
Hi, this is Sara. My order #4471 hasn't arrived and the wedding is Saturday.
Please help urgently.
\"\"\"
"""

A typical model response:

{"sender_name": "Sara", "intent": "complaint", "urgency": 3}

Tips that pay off:

Show the schema literally in the prompt. Don’t describe it in prose.
Say “respond with a single JSON object and no other text” — otherwise you’ll get a “Sure, here’s the JSON:” preamble.
Use enums ("question" | "complaint" | ...) where possible. They constrain the model to valid values.
Validate the response in code with Pydantic or similar. Treat parse failures as a normal error case.

Most modern APIs also support a structured output mode that guarantees the response matches a JSON schema. Use it when available; it eliminates whole categories of bugs.

Try it yourself. Take a real document you work with — an email, a ticket, a receipt. Write a prompt that extracts three structured fields from it as JSON. Run it five times. If the output varies between runs, tighten the schema (use enums, add “respond with a single JSON object”). Iterating on this kind of small task is how you build intuition.

Chain-of-thought, briefly

Chain-of-thought prompting asks the model to reason step by step before answering. For tasks that involve multi-step logic — word problems, code analysis, planning — it measurably improves accuracy.

A simple version:

Question: A train leaves City A at 9am at 60 mph...
Think step by step, then give the final answer on a line starting with "Answer:".

What works in practice:

Use it for problems where the model is observably worse without it.
For final-answer-only use cases, ask the model to think internally and then output just the answer (some APIs let you hide the reasoning).
Don’t use it for simple lookups or transformations — it just wastes tokens.

Newer “reasoning” models do chain-of-thought internally and you don’t need to prompt for it. Check what your model supports before adding instructions it doesn’t need.

Make the model say “I don’t know”

By default, LLMs are biased toward giving an answer. To reduce hallucinations on lookup-style questions, give the model an explicit out:

If the answer is not in the provided document, respond with exactly:
NOT_FOUND

Document:
"""
<text>
"""

Question: <question>

This shifts a hallucination into a structured failure your code can handle.

Iterating: eval-driven prompting

Here is the part that separates serious prompt work from folklore.

Don’t tune prompts by vibes. Build a small evaluation set — 20 to 100 representative inputs with the answers you want — and run your prompt against all of them. Each tweak gets a score. You keep what improves the score and discard what doesn’t.

A minimal eval loop, conceptually:

test_cases = [
    {"input": "...", "expected": "bug"},
    {"input": "...", "expected": "feature"},
    # ... more
]

def score(prompt: str) -> float:
    correct = 0
    for case in test_cases:
        output = call_llm(prompt, case["input"])
        if output.strip() == case["expected"]:
            correct += 1
    return correct / len(test_cases)

print(score(prompt_v1))    # 0.74
print(score(prompt_v2))    # 0.81

Without this, you’ll convince yourself a prompt is better because you remember the time it worked well. With it, you have a number that doesn’t lie.

For open-ended outputs where exact matching doesn’t work, judge quality with a second LLM call against a rubric, or have a human grade a sample. Imperfect evals beat no evals.

Common prompt patterns

A few patterns that keep showing up:

Role + task + format + input:

You are a [role].
Your task is to [task].
Respond in [format].

Input:
"""
<input>
"""

Constraint list:

Rules:
- Do not include personal opinions.
- Use only information from the provided source.
- If unsure, respond with NOT_SURE.

Inline examples for tone matching:

Give two or three examples of “good” output. The model will match the voice.

Two-stage: Run one prompt to extract structure, another to act on it. Often more reliable and easier to debug than a single mega-prompt.

Anti-patterns to avoid

Vague urgency. “This is very important” doesn’t help. Specific instructions do.
Telling the model not to do something without saying what to do instead. “Don’t be wordy” → “Respond in at most two sentences.”
Stuffing irrelevant context just in case. Long prompts cost money and dilute focus.
Threats and bribes. “I will tip you $200” was a viral trick that doesn’t reliably help modern models. Write a clear brief instead.
Trusting one good run. A single success is not evidence of a good prompt. Run it ten times.

A small worked example

Goal: classify customer feedback into themes and rate sentiment.

system_prompt = """
You are an analyst. Classify each piece of feedback into one or more themes
and rate sentiment from -2 (very negative) to +2 (very positive).

Return a single JSON object:
{
  "themes": string[],   // pick from: pricing, performance, design, support, bug
  "sentiment": -2 | -1 | 0 | 1 | 2,
  "summary": string     // one sentence, neutral tone
}

If the feedback is incoherent, return themes [] and sentiment 0.
"""

user_prompt = """
Feedback: "Loved the new dashboard layout, but it lags on my older laptop."
"""

A typical response:

{
  "themes": ["design", "performance"],
  "sentiment": 1,
  "summary": "Positive about the new dashboard design but reports lag on older hardware."
}

This prompt is short, explicit, schema-driven, and handles the failure case. It is also testable against an eval set. That is what good prompt engineering looks like in practice.

Try it yourself. Build a 20-item eval set for the feedback classifier above. Write down what themes and sentiment you expect for each one. Then run the prompt and score it. When you’re done, try removing the failure-case rule and re-score. You’ll see how a single line of prompt changes the numbers.

Recap

You now know:

System prompts carry persistent behaviour; user messages carry the input
Clarity beats cleverness — state format, focus, and delimiters explicitly
Few-shot examples shape style and structure faster than prose
For machine-readable output, specify a JSON schema and validate it
Chain-of-thought helps on multi-step reasoning; skip it for simple tasks
The single biggest lever is eval-driven iteration — measure, don’t guess

Next steps

A common follow-up to prompt engineering is retrieval-augmented generation — putting your own data into the model’s context so it can answer questions about your domain.

→ Next: What Is RAG?

Questions or feedback? Email codeloomdevv@gmail.com.