Chapter 02 · Section II · 14 min read
Showing with examples
Sometimes a single good example is worth a paragraph of instruction. The technique called few-shot prompting, and when to reach for it.
The previous section was about telling. This section is about showing. For many tasks — especially ones where the right output is hard to describe in words — showing the model a couple of examples produces dramatically better results than any amount of instruction.
This technique is called few-shot prompting. The fancy name hides a simple idea: include 1–5 worked examples of input → output in your prompt, then give the model the new input. It learns the pattern from the examples on the fly.
A case where instructions struggle
Imagine you want to classify customer feedback from a Nepali restaurant into three categories: praise, complaint, suggestion. You write the prompt:
Classify each piece of feedback as praise, complaint, or suggestion.
Feedback: “मोमो धेरै राम्रो थियो तर तातो थिएन।”
The model will answer something — but not consistently. Is “the momo was good but cold” praise (it was good) or complaint (it was cold)? Different models will pick differently. So will the same model on different days.
Now try with examples:
Classify each piece of feedback as praise, complaint, or suggestion.
Examples:
- “खाना मीठो थियो।” → praise
- “सेवा ढिलो थियो।” → complaint
- “मेनुमा पनिर थप्नुहोस्।” → suggestion
- “भोलि पनि आउनेछु।” → praise
- “प्लेट सफा थिएन।” → complaint
Feedback: “मोमो धेरै राम्रो थियो तर तातो थिएन।”
The model now sees the pattern: when a piece of feedback mixes a positive and a negative, complaints about food quality win over praise. It picks complaint consistently. Same model, same task, different prompt — much more reliable output.
When to use few-shot
A rough rule. Reach for few-shot when:
- The task is repetitive. You’re applying the same transformation to many inputs. Examples lock in consistent output.
- The output format is precise. JSON, a table, a specific style of summary. An example is worth more than a description.
- The right answer is subtle. Cases like the cold-momo example above, where reasonable people might disagree. Examples reveal your judgement.
- You want a specific style. “Write like a Nepali newspaper editorial” is vague; showing two newspaper editorial paragraphs is precise.
Skip few-shot when:
- The task is a one-off (you don’t need consistency).
- The task is so open-ended that examples would constrain the model too much.
- You don’t have any examples to give and can’t construct one quickly.
A worked example: extracting structured data
A common practical use: you have 50 lines of messy customer feedback, and you want them in a clean table. Instructions alone are noisy; few-shot makes it deterministic.
Prompt:
Extract from each piece of feedback: category (praise/complaint/suggestion), sentiment (positive/neutral/negative), and a one-sentence English summary. Output as a JSON object.
Examples:
Input: “खाना मीठो थियो र बेरा पनि नम्र हुनुहुन्थ्यो।” Output:
{"category": "praise", "sentiment": "positive", "summary": "Food was tasty and the waiter was polite."}Input: “एक घण्टा कुर्नुपर्यो, खानाको रंग पनि अनौठो थियो।” Output:
{"category": "complaint", "sentiment": "negative", "summary": "Long wait and unusual-looking food."}Input: “वेज मोमोको चटनी अलि पिरो बनाउनुहोस्।” Output:
{"category": "suggestion", "sentiment": "neutral", "summary": "Make the veg momo chutney spicier."}Now process the following 50 pieces of feedback in the same format:
What you get back, with high reliability, is 50 JSON objects you can paste straight into a spreadsheet. Without the examples, you would get inconsistent fields, English summaries in Nepali, complaints labelled as suggestions — fix-by-hand work that erases the time savings.
Why this works
The technical name is in-context learning. The model doesn’t change its weights from your examples — it has not been “trained” on your three examples. But the next-token prediction, conditioned on those examples as context, becomes vastly more focused on the pattern they demonstrate. It is as if the model briefly specialises in your task for the duration of the conversation.
This is one of the most remarkable properties of modern large language models, and it was largely unexpected when it first emerged. Smaller models do not show it strongly. Frontier models show it spectacularly.
How to choose examples
Three habits that produce good few-shot prompts:
-
Cover the edge cases. If you have an ambiguous category — like complaint-mixed-with-praise — include one example of each side. Don’t leave the model to guess your tie-breaking rule.
-
Match the input distribution. If your real inputs are 80% in Nepali and 20% in English, your examples should reflect that. Don’t use only-English examples for Nepali inputs.
-
Keep examples short. Long examples waste tokens. Use the minimum example that demonstrates the pattern.
A useful sanity check: imagine handing the prompt to a thoughtful human who has never seen the task. Would the examples alone be enough for them to do the next case? If yes, the model probably can too.
When few-shot is a trap
One genuine failure mode. If your examples are biased in a particular direction — say, you happened to pick three examples that are all praise — the model will lean toward predicting praise for ambiguous new inputs. The bias of your examples becomes the bias of the output.
The fix is to balance your examples: roughly equal representation across categories, edge cases acknowledged. This is the same principle we saw in the Introduction to AI course around training data — a model only knows what it has been shown, and your few-shot examples are a tiny, fast training set.
Check your understanding
Quick check
—Few-shot prompting (showing the model 1–5 worked examples in the prompt) is especially effective when:
What comes next
We have prompting basics and the few-shot technique. The next section is about what to do when the output isn’t what you wanted — debugging your prompts, iterating, and the small repertoire of techniques that get a stuck model un-stuck.