Start with a dataset you already own
This started with a simple realisation: I already owned about ten years of detailed data on my own life, and I had never once looked at it. Tesco keeps every order. So does Amazon. Your ChatGPT history is downloadable. An Oura ring records every night you have slept. Most people are sitting on a rich personal dataset and have never opened it.
That is the best place to start building. You already understand the data, you actually care about the answer, and there is no privacy worry because it never has to leave your machine. I picked my Tesco order history, roughly a decade of grocery shops, and built a dashboard to see what it actually said.
The demo runs on fictional sample data: 200 randomly generated transactions. None of my real shopping is in it.
What it does
The dashboard reads a grocery export and turns it into four views:
- Overview: spend over time, by month, by year, and by day of the week.
- Items & Categories: every product, grouped into about twenty clean categories, searchable and sortable.
- Health & NOVA: every item mapped to the NOVA scale, which sorts food by how processed it is (NOVA 1 is unprocessed, NOVA 4 is ultra-processed). It tracks what share of my spend goes on ultra-processed food, and how that has changed year on year.
- Shopping List: the things I buy on repeat, grouped by how often.
The NOVA view is the heart of it. Most people have a rough sense of how processed their diet is. This puts a number on it, and tracks the number over time.
What the data actually showed
A few things were worth the whole exercise on their own:
- The Clubcard quietly saved about £4,000 over the decade, roughly 8 to 9 percent of the grocery bill. Almost none of it before 2020, so really it is the last five years. A small discount, compounded over years, adds up to a holiday.
- About 15 percent of my grocery spend goes on ultra-processed food. That sounds low until you remember it is a share of money, not calories. The UK average is closer to 57 percent of calorie intake. Whenever you publish a number like this, say which denominator you are using, or it means nothing.
- A rising grocery bill is two different stories: prices going up, and buying more. Splitting them apart is the difference between "inflation is real" and "we just eat more than we used to". Over the decade, extra-virgin olive oil roughly tripled, and free-range eggs and fish fingers roughly doubled.
- Home-delivery baskets ran about five times bigger than in-store top-up visits. Fewer, bigger shops is a genuine efficiency lever, not just a feeling.
What broke along the way
The dashboard was the easy part. The interesting work was everything that went wrong:
- The classifier is order-sensitive. It is a long list of keyword rules, and the first rule that matches wins. CBD drinks with names like "Raspberry & Guava" kept landing in Fruit & Veg until I moved a Drinks rule above the fruit rules. Almost every bug was a version of this.
- It worked for the AI and broke for me. The moment I opened it from my own folder it threw a CORS error, because browsers will not let a local file load another local file with a fetch. The fix was to stop fetching and load the data through a plain script tag instead. That one cost an afternoon.
- Ghost charts. Toggling the time period left old charts stacked under the new ones. Chart.js needs you to explicitly destroy the old chart before drawing the next.
- Seven tabs became four. The first version had about twice as many charts as it needed. Cutting it down made it more useful, not less. Most of the charts were noise.
- NOVA is fuzzy at the edges. Is a plain microwave rice pouch processed, or ultra-processed? Reclassifying that one group moved the headline number by a full percentage point. So I wrote the judgement call down in plain sight, rather than hiding it inside the code and pretending there was one true answer.
Build your own
Everything you need is here. Open the live demo to see it working on sample data. The export guide explains how to get your own Tesco history (a GDPR data request is the clean route), and the quick-start shows how to run the build script. It is a single HTML file plus a small Node script. Your data never leaves your computer.
The point
The dashboard itself is ordinary. The point is the habit: find a dataset you already own, ask it a question you genuinely care about, and let an AI help you build the thing that answers it. That is a better first project than any tutorial, because you are the one person who can tell whether the answer is right.