Where Nepali NLP stands · ailiteracy.nepal

Ask a modern language model to write you an email in Nepali and it will produce something that is — at first glance — surprisingly good. Punctuation in place, grammar mostly correct, sentence structure passable. Read it more closely, and the seams begin to show. Honorific levels are wobbly. Idioms get translated word-for-word. A model that comfortably knows Diwali sometimes confuses Tihar with Holi. This section is about why.

The honest baseline

For roughly the year 2026, the rough state of Nepali in major commercial models looks like this:

Generation of conversational Nepali is competent for everyday use — emails, summaries, friendly chat, simple translation.
Comprehension of Nepali questions is mostly fine for short, well-formed inputs, and degrades quickly as the input grows long, code-mixes English, or moves into technical vocabulary.
Factual knowledge about Nepal — history, geography, current politics, federal structure — is patchy and confidently wrong in roughly the same proportion as it is for other low-resource topics.
Cultural fluency — honorifics, kinship terms, festival timings, regional language differences — is the weakest layer, and the one most likely to give you a polite-sounding but wrong answer.

The improvement curve here is steep. Models released a year from now will be markedly better. But the shape of the gap — strong on form, weak on facts, blind on culture — is likely to persist for a while.

Why tokenisers are quietly expensive

Tucked inside every commercial LLM is a tokeniser — the component that chops your input text into the chunks the model actually consumes. Tokenisers are trained on text corpora, and like everything else in this pipeline, those corpora are mostly English.

The practical consequence: a Nepali sentence usually breaks into far more tokens than an equivalent English sentence. Where "How are you?" might be 4 tokens, "तपाईंलाई कस्तो छ?" can be 12 to 20 — sometimes more, because each Devanagari character (and each combined mātrā) may cost its own token.

There are mitigations. Some providers ship tokenisers that handle Devanagari more efficiently than others — it is worth measuring. Open-weight models like Llama and Qwen can be fine-tuned with a Nepali-aware tokeniser, narrowing the gap considerably. But until the underlying models are pre-trained with more Nepali in the mix, the tax remains.

Code-mixed Nepali — Nepali + English + Hindi

Real Nepali, especially in writing and in cities, is rarely pure. A typical urban WhatsApp message reads "yaar kasto छ aja, plan के छ?" — three languages, two scripts, one sentence. This is normal communication, not slang, and a Nepali NLP product that cannot handle it is unusable.

Models do better at code-mixed Nepali than they used to, but they still tend to “fix” it back into clean monolingual output. A user types Romanized Nepali into the app; the model responds in formal Devanagari. The user feels lectured at, and stops using the product.

Honorifics — the trapdoor

Nepali has at least three honorific levels for “you” — तँ, तिमी, तपाईं — plus the deferential हजुर. Choosing the wrong level is not a politeness mistake; in many contexts it is a social mistake, one that signals contempt or familiarity the speaker did not intend.

Foreign LLMs handle honorifics by guessing, and the guess is usually तपाईं — formally safe, often awkwardly so. A user texting their best friend an AI-generated reply in pure तपाईं form will sound, to the friend, like a stranger. Building a Nepali-aware product almost always means adding an explicit honorific layer on top of the base model, picking the right level from context and rewriting accordingly.

What this means for product builders

Three practical implications for anyone building a Nepali-language product:

Test in real Nepali, not translated English. A model that scores well on translated English benchmarks may stumble on the actual messages your users write. Build an evaluation set in genuine Nepali — code-mixed, Roman-and-Devanagari, with the honorific register your users speak in.
Budget the tokeniser tax. When pricing a feature, assume 3–5x the per-message cost of the same feature in English. If the unit economics don’t work at that multiplier, redesign before you ship.
Plan to fine-tune. A small fine-tune on a few thousand high-quality Nepali examples — your own product’s tone, your own honorific defaults — recovers more quality than any prompt-engineering trick. The capability exists at low cost for open-weight models; the activation energy is the only thing in your way.

Check your understanding

Quick check

—

A Kathmandu startup is pricing an AI chatbot feature that will be used in Nepali. They benchmark cost at $0.01 per message in English. What is the most reasonable initial budgeting assumption for the same feature in Nepali, before any optimisation?

About $0.01 — language doesn't affect cost.
About $0.03 to $0.05 — Nepali sentences cost 3–5x more tokens to encode in standard tokenisers.
About $0.10 — Nepali requires a fully separate model.
About $0.001 — Nepali is shorter than English.