Defining generative AI · ailiteracy.nepal

A junior accountant at a small firm in Pulchowk opens ChatGPT, types “draft a polite Nepali email asking a client to pay their overdue invoice,” and gets back a usable first draft in three seconds. That single moment captures what generative AI is — and why it’s a different category of tool from the AI we met in the Introduction to AI course.

The AI in that earlier course mostly classified: is this email spam, is this transaction fraud, is this image a क. Generative AI produces: a draft email, a paragraph, a picture of a Newari home, a snippet of Python code. Same underlying mathematics; very different job.

A working definition

A generative model is a system that can produce new outputs — text, images, audio, code, video — that fit the patterns of the data it was trained on.

ChatGPT was trained on enormous amounts of human writing. When you ask it to draft an email, it produces text that looks like the kind of email that might have appeared in its training data, given the instruction you gave. It is not retrieving a real email. It is composing one on the fly.

The same shape holds across modalities:

A text model was trained on text. It generates text.
An image model (Midjourney, DALL·E, Stable Diffusion) was trained on labelled images. It generates images.
A music model was trained on music. It generates music.

What unites them is output. The output did not exist before you asked for it. It is, in a precise sense, generated.

How this is different from older AI

In the previous course we said: AI is statistical pattern-matching on data. That is still true. What’s different about generative AI is the direction the pattern-matching runs.

A classifier maps input to a label: image of a letter → “this is क.” A generative model maps input to a long output: instruction → a whole paragraph that fits the instruction. The first answers which. The second answers what would come next, and next, and next, until it stops.

The technical mechanism that made this possible at scale is the transformer architecture, which we met briefly at the end of the Introduction to AI course. A modern language model predicts the next word, then the next, then the next, with each prediction shaped by everything that came before. Combine that loop with enormous training data, and you get a system that can produce coherent paragraphs about almost anything.

Three concrete capabilities, three concrete misconceptions

Three things generative AI can genuinely do well today, and three matched things it cannot:

It can. Produce a fluent first draft in Nepali or English, faster than a human can type. It cannot. Guarantee the draft is true. Treat every factual claim as something you must verify.
It can. Generate a striking image from a sentence-long description. It cannot. Reliably draw text inside an image, count fingers, or place specific Nepali cultural details correctly without help.
It can. Translate between dozens of languages well, including Nepali ↔ English at near-professional quality. It cannot. Translate accurately between low-resource Nepali languages (Limbu, Tharu, Magar) — the training data is too thin.

The pattern is consistent: where the underlying data is rich and the task has clear right answers, generative AI is very good. Where data is thin or the task requires real-world grounding, it is unreliable.

What we will do in this course

This is a practical course. We will not retrain you on neural network internals. We will teach you the skills that decide whether these tools are useful to you or wasteful:

How to prompt a model — the single skill that decides whether you get a useful output or a vague one.
How to handle the main modalities — text, image, voice — and what each is realistically good for.
How to spot the model’s limits and failures before they hurt you.
How to use these tools honestly — in school, at work, in public — without losing your own skill.

You do not need to know how to code. You do not need Python. You will need a computer or phone, an account on at least one model (ChatGPT, Claude, or Gemini work), and an hour or two per chapter.

Check your understanding

Quick check

—

Which is the most accurate description of a generative AI model?

A system that retrieves human-written answers from a database
A system that produces new outputs (text, image, etc.) in the style of its training data, in response to an instruction
A search engine that ranks websites
A program that follows hand-written rules to label inputs

Quick check

—

A generative model produces a confident, fluent answer about a specific small village in Mustang that turns out to be wrong. The best explanation is:

The model has bugs that need fixing
The model is dishonest
The model composes outputs that *sound right* given its training data, and its training data on this specific village was thin or absent
The user prompted the model incorrectly

What comes next

The next section opens the hood, briefly, to show what’s actually inside a modern generative model — large language models, foundation models, and the training process that makes them work. The goal is not to make you an engineer; it is to give you enough vocabulary to read about these tools sensibly.