Privacy and data · ailiteracy.nepal

Everything you type into a consumer generative AI tool — ChatGPT, Claude, Gemini, Midjourney — is transmitted to a foreign server, processed there, and frequently logged. The default settings on most consumer tools allow the provider to use your conversations to improve future models. Many users don’t know this. Many should.

This section is a practical guide. What gets stored, what doesn’t, and what to do when the work is genuinely confidential.

Default behaviour in 2026

For the major consumer chatbots, the default looks roughly like this:

Your prompt is sent to the provider’s servers. It has to be — that’s where the model runs.
Your conversation is stored for some period (often 30 days, sometimes longer, sometimes indefinitely depending on plan).
The conversation may be used to improve future models — unless you explicitly opt out.
A small percentage of conversations are reviewed by human safety/quality teams.

This is not the same as “your data is published” or “anyone can read it.” Major providers take security seriously. But the data does leave your device, sit on someone else’s servers, and may be touched by humans you don’t know.

For most everyday uses — drafting an email, brainstorming, asking a general question — this is fine. The information was not sensitive. For confidential uses, it is not fine.

What counts as confidential

A working list. Treat the following as confidential unless you’ve explicitly checked otherwise:

Personal data of others. Names, phone numbers, addresses, medical records, financial records of clients, patients, students.
Business confidentials. Internal strategy documents, unannounced product details, financial projections, employee performance reviews.
Source code with trade secrets or unannounced security vulnerabilities.
Government-classified material. Anything labelled confidential, restricted, or secret under Nepal’s official secrets regime.
Material under NDA. Anything you’ve signed a non-disclosure for.
Material you wouldn’t be comfortable seeing on the front page.

If you wouldn’t paste it into a public message board, don’t paste it into a consumer chatbot without explicitly checking the data terms.

Three levels of privacy, three corresponding tools

A practical framework.

Level 1 — Anything goes. Public material, generic questions, no confidential information. Use whatever tool you like. Default consumer chatbots are fine.

Level 2 — Business confidential but not regulated. Internal strategy, drafts of unannounced material, source code without trade secrets. Use a business-grade tool — ChatGPT Enterprise, Claude for Work, Microsoft Copilot Enterprise, Google Workspace AI features. These have explicit “we will not train on your data” terms. Read them.

Level 3 — Regulated, classified, or genuinely sensitive. Patient records, legal proceedings, classified government material, banking customer data. Either: (a) use a tool with explicit regulatory compliance (HIPAA, GDPR, equivalent local frameworks), or (b) use a model that runs entirely on your own infrastructure — an open-source model like Llama, Mistral, or Qwen, deployed on your own server.

The Level 3 path is harder. It requires technical setup. It also gives you complete sovereignty — the data never leaves your control. For organisations handling sensitive Nepali data — hospitals, courts, banks, the government — this is the only acceptable path.

The “private” mode toggles

Most consumer chatbots now have a setting called something like “do not use my conversations to train the model.” Find it. Turn it on.

ChatGPT — Settings → Data Controls → “Chat history & training” → off.
Claude — Settings → Privacy → “Help improve Claude” → off.
Gemini — Activity controls in your Google account.

This does not eliminate logging. Your conversation may still be retained for up to 30 days for abuse review. But it removes the training-data risk and signals to the provider that you don’t consent to that use.

For business plans (ChatGPT Team, Claude for Work, etc.), training on your data is off by default. That is one of the things you are paying for.

A working pattern for organisations

If your organisation uses generative AI, three policies worth setting:

A tier system. Decide which kinds of data go to which tier of tool. “Public data → free ChatGPT. Internal data → enterprise tool. Customer data → self-hosted only.”
A training opt-out. Even for consumer use, ensure all employees have toggled training opt-outs in their accounts.
A red-line list. Explicit list of categories that may never be pasted into any external tool. Make it short. Make it specific. Train people on it.

These three policies cover most realistic risks. The hard part is not the policies — it is creating a culture where people actually pause before pasting.

Self-hosted models: the sovereign option

For confidential work, the most secure choice is to run a model on infrastructure you control.

This used to be impractical — frontier models required enormous compute. In 2026, the situation has changed:

Open-source models (Llama 3.1+, Qwen 2.5+, Mistral Large) approach frontier capabilities and are free to run.
A reasonably-spec’d server can host a useful model — slower than a frontier chatbot, but private.
Smaller models (7-8 billion parameters) run on a single GPU; larger ones (70-405B) require multiple GPUs.
Hosting services (Together, Anyscale, RunPod, Ollama for local) make deployment accessible without deep ML expertise.

For Nepali institutions handling sensitive data, this path matters. The Nepali health ministry should not be running diagnostic prompts on US-hosted models. A Nepali law firm should not be uploading client confidential material to ChatGPT. The technical bar to self-hosted models has dropped enough that this is now a practical conversation, not a wishlist item.

The personal habit

For an individual user, three habits cover most risk:

Toggle the training opt-out on every consumer tool you use.
Maintain a mental list of what kinds of information you will not paste.
For confidential work, use a tool with explicit privacy terms — even if it costs more.

These three habits, practiced consistently, eliminate the vast majority of personal privacy risk. The remaining risk is operational (a misclick, a moment of inattention) — and is managed by being deliberate before any sensitive paste.

Check your understanding

Quick check

—

A doctor at a Kathmandu hospital wants to use generative AI to summarise patient case notes. The notes contain identifiable patient information. What is the most appropriate tool choice?

Free ChatGPT with default settings
A privacy-compliant tool with explicit data protection terms (or a self-hosted model on hospital infrastructure)
Any tool — the hospital is not responsible for the doctor's choice
Email the patient notes to a colleague to summarise

What comes next

We’ve covered privacy. The next section is about honesty in your use — citation, disclosure, plagiarism, and the integrity questions that schools, employers, and clients are still working out around generative AI.