Chapter 05 · Section III · 14 min read
Advanced neural network techniques
Convolutions for images, transformers for language — the two architectural ideas behind almost everything you've heard of.
Two ideas, both from the last fifteen years, are responsible for almost every famous AI system in 2026.
Convolutions are the trick that made image recognition work. A convolutional neural network looks at small patches of an image at a time, and learns features — edges, textures, shapes — that compose into objects. This is what reads Devanagari off a sign in Asan.
Transformers are the trick that made language work. A transformer can look at every word in an input simultaneously and decide which words are relevant to which. GPT, Claude, Gemini — all transformers. So is Google Translate’s Nepali model.
This section is a stub. The full version will give a one-paragraph intuition for each, and point at further reading.