Chapter 02 · Section III · 14 min read
Iteration and debugging
What to do when the output is wrong. A small playbook of techniques that recover most stuck conversations.
Even with a careful prompt, your first output won’t always be what you wanted. The model picks the wrong tone. It misses a constraint you gave. It hallucinates a fact you didn’t realise it would. This is normal. The skill is not getting a perfect output on the first try — the skill is iterating efficiently to a usable one.
This section is a small playbook. Six techniques, each useful in a specific failure mode, and a rough flow for when to use which.
Technique 1 — Just ask it to fix the specific problem
The simplest technique, and the one most people forget. Tell the model what you didn’t like, and ask it to revise.
Almost right, but the third paragraph is too formal. Rewrite that paragraph in a warmer, more conversational tone. Keep everything else the same.
The model will edit. You don’t need to rewrite the whole prompt. The conversation history is part of the context; the model knows what “the third paragraph” refers to.
This works for: tone, length, language, specific factual errors, formatting. It does not work well for fundamental structural problems — for those, go back and rewrite the prompt.
Technique 2 — Show what’s wrong with an example
When the model is making a recurring mistake, point to one instance.
In example #4 you wrote “due by Friday” — the original Nepali said “अर्को साता शुक्रबारसम्म” which means by NEXT Friday, not this Friday. Fix that, and watch for the same error elsewhere.
Specific feedback tied to specific examples produces faster correction than vague “be more accurate.” The model is good at applying a noted correction to similar cases.
Technique 3 — Ask it to think step by step
For tasks that require reasoning — math problems, multi-step plans, logic puzzles — the single best fix when output is wrong is to ask the model to show its work.
Solve this step by step. Before giving your final answer, write out each step of your reasoning.
This technique is called chain-of-thought prompting, and on reasoning-heavy tasks it can lift accuracy from 30% to 80% on the same model. The model is, in effect, using the act of writing to think. Without that scaffold it tries to leap straight to an answer and trips.
Modern models (GPT-4, Claude 3.5+, Gemini 2.0+) often do this implicitly when they detect a reasoning task. Saying it explicitly costs you nothing and helps when it’s needed.
Technique 4 — Ask it to critique its own output
When you suspect the model is wrong but you’re not sure what’s wrong, ask it to check itself.
Now look at the draft you just wrote. List any factual claims that you are not sure are correct. List any places where the tone is off. List any constraints from my original prompt that you didn’t follow.
A model criticising its own work is surprisingly often correct about the flaws. It will flag the made-up statistic, notice the wrong language, catch the missing greeting. Then you can ask it to fix those specific things.
This won’t catch everything — a model that believed a wrong fact won’t flag it. But it catches a meaningful fraction of mistakes you would otherwise have to find yourself.
Technique 5 — Start a fresh conversation
Sometimes the conversation has gone wrong. The model latched onto a misunderstanding early, and every subsequent message keeps reinforcing it. Long conversations also dilute your original instructions in the model’s attention.
The fix: open a new chat and start over with a clean, complete prompt. Don’t try to nurse a stuck conversation back to health. The cost of restarting is tiny; the cost of continuing to push a stuck conversation is high.
Technique 6 — Reach for a different model
Different models have different strengths. If one isn’t working, switching tools is not failure — it’s pragmatism.
Rough patterns in 2026:
- Claude tends to follow long, complex instructions and constraints faithfully. Good for careful writing, code, careful reasoning.
- GPT-4 / ChatGPT is well-rounded; the largest ecosystem of plugins and integrations. Often the default.
- Gemini has strong multilingual coverage and integrates with Google Search, useful for current-information tasks.
- Open-source models (Llama, Mistral, Qwen) are useful when data must stay private or when the task is repetitive enough to warrant running it on your own hardware.
If a model keeps producing the same kind of wrong output, give the same prompt to another and compare. Five minutes of comparison teaches you which model is right for the kind of work you do most often.
A flow for when output is wrong
A compact heuristic. When the output isn’t what you wanted:
- First — read it carefully. Is it actually wrong, or just different from what you expected? Sometimes the model’s version is better than yours.
- If wrong — is the failure cosmetic (tone, format, length)? Use Technique 1: ask for the specific fix.
- If reasoning is off — Technique 3: ask it to show its work.
- If you’re not sure what’s wrong — Technique 4: ask it to critique.
- If the conversation is stuck — Technique 5: start fresh.
- If a particular model keeps failing on a category of task — Technique 6: try another.
These six together handle ~90% of failures. The remaining 10% are usually limits of the technology itself — situations we will meet in Chapter 5.
A note on patience
The patient user gets dramatically more from these tools than the impatient one. Three small habits compound:
- Read the output before reacting. You are paying for tokens (in money or in your monthly limit). Use them.
- Ask follow-ups before retrying from scratch. The conversation context is valuable.
- Save prompts that work. A folder of templates for common tasks pays back many hours.
There is a popular mental image of expert AI users as people who type one perfect prompt and get a perfect answer. The honest picture is closer to a careful editor who iterates with a junior writer who is fast, infinitely patient, and occasionally wrong.
Check your understanding
Quick check
—A model gives the wrong answer to a multi-step math word problem. Which single technique is most likely to improve accuracy?
What comes next
We’ve covered the engine, the modalities, and the skill of prompting. Chapter 3 puts these together for the most common kind of generative AI use — working with text. Drafting, editing, summarising, extracting, translating. The everyday tasks that quietly add up to most of the value people get from these tools.