Connectivity, devices, and the data we already have

If the previous section was the argument for why Nepal needs its own AI conversation, this one is the inventory. Before we ask what AI should be built for Nepal, we need to know — coldly — what hardware will run it, what connection will reach it, and what data already exists to train it. The picture is not as bleak as the cynics say, and not as easy as the brochures suggest.

The median Nepali phone

Forget the iPhone in the brochure. The phone that matters in Nepal is a four-to-six-year-old Android in the Rs. 15,000–30,000 band: a Samsung Galaxy M-series, a Xiaomi Redmi Note, an Infinix Hot, a Realme C. It has 4 GB of RAM if you are lucky, 3 GB more often. The chipset is from Qualcomm’s 400/600 line or MediaTek’s Helio. The user is on Android 12 or 13, often without further updates.

This is the device that any “AI for everyone” claim has to fit on. It cannot run a 70-billion-parameter model. It can, comfortably, run quantised on-device models in the hundreds of millions of parameters — Whisper-small for speech, a distilled translation model, an OCR engine. It can do this offline, which matters more than it sounds.

The network underneath

Nepal’s mobile connectivity is uneven in a specific way. NTC and Ncell give you something approximating 4G in any town of consequence and along the main highways. Once you turn off the highway, or climb above 2,000 metres in many districts, the signal drops to a slow 3G or 2G — sometimes nothing. Fixed broadband (WorldLink, Subisu, Vianet, Classic Tech) is excellent inside the Kathmandu and Pokhara valleys, decent in the larger municipal centres, and unavailable in much of rural ward life.

Two consequences for AI:

The cloud is sometimes far away, and sometimes not there at all. A chatbot that depends on round-tripping every message to a US server will feel laggy in Janakpur and unusable in Humla. A model that runs on the phone is a different product — slower per request, but reliably present.
Bandwidth costs are visible to users. Most Nepalis are on prepaid data. They notice — and resent — apps that quietly stream large payloads. An AI feature that downloads 200 MB of model weights once and then runs free is cheaper over the year than one that calls an API on every interaction.

Data that already exists (but isn’t easy to use)

The country is not data-free. The country is data-scattered, data-scanned, and data-siloed. Some of what already exists, if you go looking:

Government datasets on the National Data Portal, the Department of Statistics, and Open Data Nepal — censuses, health surveys, household surveys, agricultural inputs.
NepaliBERT, IndicTrans, and academic Nepali corpora — pre-trained language models and text collections from researchers at IOE, KU, and abroad. These are the substrate any Nepali NLP project should start from rather than collecting from scratch.
ICIMOD’s Mountain Geoportal — satellite-derived datasets for the entire Hindu Kush–Himalaya region, including Nepal-specific layers on glaciers, land cover, and disaster history.
Department of Hydrology and Meteorology archives — decades of rainfall and river-gauge data, partly digitised, partly still in field logbooks.
Health Management Information System (HMIS) — facility-level data from public health institutions, aggregated monthly.
Telecom data — NTC and Ncell sit on call-detail records, location pings, and mobility patterns that are, in principle, gold for any model that wants to understand how the country moves. Whether this data ever becomes accessible to researchers under reasonable privacy rules is a policy question, not a technical one.

What is missing is curation. Most of these datasets are PDFs, scanned forms, or single-server downloads. None of them comes with the machine-readable schema, train/test split, and benchmark a modern ML team would expect on day one. Curating them is unglamorous work that is also some of the highest-leverage AI work that can be done in Nepal right now.

Compute: the boring constraint

Compute in Nepal is mostly rented from abroad. There is no commercial GPU cluster of consequence inside the country, though several universities and a few private firms have small training rigs. For training anything serious you go to AWS, GCP, or — increasingly — a regional provider in Singapore or Mumbai. For inference, you either go to the same providers or you push the work to the phone.

This has two implications. First, training models in Nepal currently means paying foreign currency in dollars, which is sensitive to remittance flows and central-bank rules. Second, the cheapest path to useful AI in Nepal almost always involves taking a model trained elsewhere and adapting it locally, rather than training from scratch.

Check your understanding

Quick check

—

A developer is planning an AI app for use across rural Nepal. Which of the following is the strongest design constraint they should plan around from day one?

Support for the latest iPhone Pro Max display.
Running well on a 3 GB-RAM Android phone, offline or on intermittent 4G.
Maximum throughput on a high-end developer workstation.
Dark-mode and accessibility options for English-speaking users.

Quick check

—

What is the most useful framing for the state of Nepali public datasets?

There is essentially no data — projects must collect everything from scratch.
There is plenty of data, ready to plug into modern ML pipelines.
A lot of data exists, but it is scattered, scanned, or siloed — curating it is itself a high-leverage AI project.
All useful Nepali data is held privately by telecoms and banks and cannot be used at all.

What comes next

Hardware and data describe the floor an AI project sits on. The next section looks at the room: which people and organisations are already building on this floor today — banks, startups, NGOs, universities — and what kinds of problems they have chosen to take on.