Prompt Engineering Anti-Patterns: Mistakes That Quietly Hurt Quality

Intermediate 8 min read

What you'll learn

✓The most common prompt anti-patterns in production
✓Why each one degrades accuracy or reliability
✓A simple mental model for prompt structure
✓Concrete refactors for each anti-pattern
✓Practical tips to catch them in code review

Prerequisites

•Familiar with LLMs or Python

What and Why

Prompt anti-patterns are recurring mistakes that look reasonable but quietly degrade output quality, reliability, or cost. They are the prompt-engineering equivalent of N+1 queries: the system still works, but slowly and unpredictably. Most production prompts I have reviewed contain at least one.

Catching anti-patterns early matters because prompts compound. A vague instruction in the system message infects every downstream task. A noisy few-shot example teaches the model the wrong pattern thousands of times a day. Naming these traps gives teams a shared vocabulary in code review.

Mental Model

A good prompt has three jobs: tell the model who it is, what to do, and how the output should look. Anti-patterns usually come from collapsing these jobs together, or from over-fitting the prompt to a single example you tested once.


┌────────────────────────────┐
│  ROLE     (who am I?)      │
├────────────────────────────┤
│  TASK     (what to do?)    │
├────────────────────────────┤
│  CONTEXT  (data + rules)   │
├────────────────────────────┤
│  OUTPUT   (format contract)│
└────────────────────────────┘
       │
       ▼
 anti-patterns blur
 these boundaries

A clean prompt separates role, task, context, and output contract.

When you can point to each block, you can debug each block. When everything is one paragraph, you cannot.

Hands-on Example: Five Common Anti-Patterns

1. The wishlist prompt. “Be helpful, be accurate, be concise, be creative, be safe, be exhaustive.” Conflicting goals cancel out. Refactor: pick one primary objective and at most two constraints.

2. Negative-only instructions. “Do not include any preamble. Do not use markdown. Do not apologize.” Models respond better to positive specifications. Refactor: “Respond with a single JSON object matching this schema.”

3. Few-shot leakage. Examples that share an incidental property the real input lacks, such as all examples being short or in the same domain. The model learns the wrong invariant. Refactor: diversify examples on every axis except the one you want it to learn.

4. The mega-prompt. A 4,000-token system message accumulated across six bug fixes. Older rules conflict with newer ones, and the model picks at random. Refactor: rewrite from scratch monthly using only the rules backed by current tests.

5. Format wishful thinking. “Return JSON.” Then you parse it and crash on the one in fifty replies wrapped in a code fence. Refactor: enforce structure with tool/JSON-mode APIs, or post-parse defensively and retry.

Trade-offs

Fixing anti-patterns has a cost. Splitting a mega-prompt into smaller, composable prompts means more orchestration and possibly more API calls. Enforcing strict output formats can reduce model creativity on tasks where prose is fine. Diversifying few-shot examples takes curation effort that not every team can afford.

There is also a real risk of over-correcting. Some “anti-patterns” are perfectly fine for low-stakes tasks. A wishlist prompt for a one-off internal tool is not worth refactoring. Apply this lens proportionally to how much the prompt runs in production.

Practical Tips

Diff your prompts in version control like code. A prompt change is a behavior change.
Write a tiny eval set, even ten examples, before shipping a prompt change. You will catch regressions you cannot see by eye.
Read the prompt out loud. If you trip over a sentence, the model will too.
Prefer positive instructions and explicit output contracts over scolding the model.
Track failure cases in a “prompt bug” log so the system message stays lean.
Periodically delete rules and rerun your evals. Many rules no longer pay rent.

Wrap-up

Anti-patterns are not exotic failures; they are the boring, common, accumulated decisions that drag down LLM systems. Treat your prompts as code: structured, tested, reviewed, and refactored. Most quality problems blamed on the model are really problems in the prompt around it. Fix the prompt first.