Privacy Security

Privacy for Non-Coders: What Stays Local, What Doesn't

10 min read · February 2026

Every time you paste something into an AI chat window, you are sending data somewhere. That's not a reason to stop using these tools. It's a reason to be deliberate about what you send.

Most people using AI tools to build things have no idea what happens to their data. The privacy policies exist, they're just written for lawyers, not for non-coders who are trying to build a finance dashboard or an ingredient checker on a Saturday afternoon.

This isn't a scare piece. It's a map. Here's what I've actually figured out from building 13+ projects with AI tools — what goes where, what stays local, and the mental model I use before I paste anything sensitive.

The Core Question: Where Does the Data Go?

When you type something into Claude, ChatGPT, Gemini, or any AI tool, that text travels to the company's servers, gets processed, and a response comes back. The text you sent is now on their infrastructure.

What happens to it after that depends on the tool, the plan you're on, and what you've agreed to in settings. The default for most free tiers is: your conversations can be used to improve the model. The default for paid/API tiers varies but is generally more protective.

The key distinction

Conversations via the chat interface (claude.ai, ChatGPT.com, Gemini.google.com): Your inputs are processed server-side. Whether they're retained for training depends on your account settings and plan.

Conversations via the API (including Claude Code): By default, Anthropic and OpenAI do not use API data to train models. You're paying for inference only.

Claude Code sits in the API category. What I type here isn't going into model training. The same is true for direct API access. This is one reason professional developers pay for API access rather than using the chat interface for sensitive work.

What I Actually Do Before Pasting Data

I have a single question I ask before pasting anything: Would I be comfortable if this appeared in a training dataset?

If yes, paste away. If no, either anonymise it first, or don't use AI for that specific step.

This is my practical traffic light:

Fine to share

Sample/dummy data
Public information
Generic questions
Code without credentials
Anonymised examples

Think first

Your own spending data
Business financial data
Health information
Anything with real names
Work documents

Never paste

API keys / passwords
Other people's private data
Patient data (if healthcare)
Credit card numbers
Login credentials

The amber zone is where you need to make a judgment call. I've pasted my own spending data into Claude Code when building my finance dashboard. That was a deliberate choice I made after reading the relevant policy. I wouldn't paste someone else's financial data without explicit permission and a clear reason.

The API Key Problem

This is the most common real-world mistake. API keys are credentials — they're the equivalent of a password to a service you're paying for. If you paste an API key into an AI chat window, you've just handed that key to a third party's server.

The Pattern That Causes This

You're building something. You hit a bug. You copy the entire code file — including the line that says API_KEY = "sk-..." — and paste it into the chat for help.

The AI helps you fix the bug. But the API key is now in their system.

The fix: use environment variables, even in small personal projects. Never hardcode credentials. Your code should read API_KEY = os.environ["MY_KEY"], not the actual key value.

I learned this the hard way early on. The first time I built something that called an external API, I had the key right in the file. Claude Code pointed it out before I pushed it to GitHub, but that was lucky. Now I treat environment variables as non-negotiable from day one.

The Git and GitHub Problem

Version control is another place data goes places it shouldn't. If you commit a file containing real data or credentials, that goes into your repository history. Even if you delete the file later, it's in the history unless you explicitly rewrite it.

The practical rules I follow:

.gitignore first, always. Before I write a single line of code in a new project, I create a .gitignore file. It tells Git which files to ignore. Anything with real data, credentials, or sensitive content goes in there.
Sample data in repos, real data local. If I'm building something that processes real CSV files, the repo contains example CSVs with dummy data. The real files stay on my machine.
Public repos get extra scrutiny. If I'm putting a project on GitHub, I review every file before pushing. Private repos are slightly more forgiving but still deserve the same caution.

What "Local" Actually Means

When I say something "stays local," I mean it never leaves my machine in an uncontrolled way. The Claude Code CLI runs locally — the tool calls happen on my computer. But the AI reasoning about what to do still happens on Anthropic's servers. So:

The files Claude Code reads are on my machine
The content of those files gets sent to Anthropic's API as part of the conversation
The API processes it and sends back instructions
Claude Code executes those instructions locally

So "running locally" doesn't mean "stays on my machine entirely." It means the execution happens locally. The AI context still travels to the server. This is true of any agentic tool that uses a cloud AI model.

The tools that genuinely run entirely locally are tools running a local model — Ollama, LM Studio, or similar. These are legitimate options for genuinely sensitive work, at the cost of significantly reduced capability.

Other People's Data

This is the category most people miss entirely. Most data privacy conversations focus on your own data. But if you're building a tool that processes information about other people, the rules are different.

GDPR (if you're in the UK or EU) applies to processing personal data about living individuals. Processing includes sending that data to an AI service. If you're building a tool for a business that involves customer data, employee data, patient data, or any data about people who haven't given permission for it to be sent to an AI service — that's a legal issue, not just a privacy preference.

I stay out of that territory entirely. Everything I build either uses sample/anonymised data, or processes only the user's own data. The moment a product touches data about other people, the compliance questions are beyond what I can sensibly navigate as a non-expert.

The Mental Model I Actually Use

I think of it like this: every AI tool I use is like a very capable assistant who works for a company I don't own. I can share things with that assistant that I'd share with any competent freelancer. I wouldn't share things I'd want to stay in my building — literally or professionally.

That framing has never caused me a problem and it keeps the decision fast. Not everything needs a privacy audit. Most things are clearly fine. The few things that aren't are usually obvious.

Before You Paste — Quick Check

Does this contain anyone else's personal information?
Does this contain API keys, passwords, or credentials?
Would I be comfortable if this text appeared publicly?
Is this on a free tier (higher training risk) or paid/API (lower risk)?
If it's data — is this sample data or real data?
Does my .gitignore cover anything in this project folder with real data?

You don't need to be a privacy lawyer to build sensibly. You just need to know where the data goes and make deliberate choices about what you send. That's it.

Part of the Stackless Guide series for non-coder builders:

Part 1: Security & Privacy Foundations · Part 2: Testing & Validation · Part 3: The Toolkit