Prompt Engineering Anti-Patterns: Mistakes That Quietly Hurt Quality
A field guide to the most common prompt engineering anti-patterns, why they degrade LLM output quality, and concrete refactors that fix each one.
What you'll learn
- ✓The most common prompt anti-patterns in production
- ✓Why each one degrades accuracy or reliability
- ✓A simple mental model for prompt structure
- ✓Concrete refactors for each anti-pattern
- ✓Practical tips to catch them in code review
Prerequisites
- •Familiar with LLMs or Python
What and Why
Prompt anti-patterns are recurring mistakes that look reasonable but quietly degrade output quality, reliability, or cost. They are the prompt-engineering equivalent of N+1 queries: the system still works, but slowly and unpredictably. Most production prompts I have reviewed contain at least one.
Catching anti-patterns early matters because prompts compound. A vague instruction in the system message infects every downstream task. A noisy few-shot example teaches the model the wrong pattern thousands of times a day. Naming these traps gives teams a shared vocabulary in code review.
Mental Model
A good prompt has three jobs: tell the model who it is, what to do, and how the output should look. Anti-patterns usually come from collapsing these jobs together, or from over-fitting the prompt to a single example you tested once.
┌────────────────────────────┐
│ ROLE (who am I?) │
├────────────────────────────┤
│ TASK (what to do?) │
├────────────────────────────┤
│ CONTEXT (data + rules) │
├────────────────────────────┤
│ OUTPUT (format contract)│
└────────────────────────────┘
│
▼
anti-patterns blur
these boundaries
When you can point to each block, you can debug each block. When everything is one paragraph, you cannot.
Hands-on Example: Five Common Anti-Patterns
1. The wishlist prompt. “Be helpful, be accurate, be concise, be creative, be safe, be exhaustive.” Conflicting goals cancel out. Refactor: pick one primary objective and at most two constraints.
2. Negative-only instructions. “Do not include any preamble. Do not use markdown. Do not apologize.” Models respond better to positive specifications. Refactor: “Respond with a single JSON object matching this schema.”
3. Few-shot leakage. Examples that share an incidental property the real input lacks, such as all examples being short or in the same domain. The model learns the wrong invariant. Refactor: diversify examples on every axis except the one you want it to learn.
4. The mega-prompt. A 4,000-token system message accumulated across six bug fixes. Older rules conflict with newer ones, and the model picks at random. Refactor: rewrite from scratch monthly using only the rules backed by current tests.
5. Format wishful thinking. “Return JSON.” Then you parse it and crash on the one in fifty replies wrapped in a code fence. Refactor: enforce structure with tool/JSON-mode APIs, or post-parse defensively and retry.
Trade-offs
Fixing anti-patterns has a cost. Splitting a mega-prompt into smaller, composable prompts means more orchestration and possibly more API calls. Enforcing strict output formats can reduce model creativity on tasks where prose is fine. Diversifying few-shot examples takes curation effort that not every team can afford.
There is also a real risk of over-correcting. Some “anti-patterns” are perfectly fine for low-stakes tasks. A wishlist prompt for a one-off internal tool is not worth refactoring. Apply this lens proportionally to how much the prompt runs in production.
Practical Tips
- Diff your prompts in version control like code. A prompt change is a behavior change.
- Write a tiny eval set, even ten examples, before shipping a prompt change. You will catch regressions you cannot see by eye.
- Read the prompt out loud. If you trip over a sentence, the model will too.
- Prefer positive instructions and explicit output contracts over scolding the model.
- Track failure cases in a “prompt bug” log so the system message stays lean.
- Periodically delete rules and rerun your evals. Many rules no longer pay rent.
Wrap-up
Anti-patterns are not exotic failures; they are the boring, common, accumulated decisions that drag down LLM systems. Treat your prompts as code: structured, tested, reviewed, and refactored. Most quality problems blamed on the model are really problems in the prompt around it. Fix the prompt first.
Related articles
- Prompt Engineering Prompt Engineering: Chain of Thought
Use chain-of-thought prompting to unlock multi-step reasoning, with zero-shot, few-shot, and structured variants for production use.
- Prompt Engineering Prompt Engineering: Evaluation Loops
How to build evaluation loops for prompts so you can iterate with evidence instead of vibes. Covers datasets, graders, regressions, and how to make eval cheap enough to run often.
- Prompt Engineering Prompt Engineering: Few-shot vs Zero-shot
Decide between zero-shot and few-shot prompting by weighing example quality, cost, and how strictly you need to control output format.
- Prompt Engineering Prompt Engineering: Output Formatters
How to coax LLMs into producing predictable, parseable output using output formatters, JSON schemas, examples, and validation loops that actually hold up in production code paths.