What AI Gets Wrong (And How I Catch It)
I use AI tools to build everything I ship. I also catch AI making mistakes in almost every session. These two facts aren't contradictory: they're the whole point.
If you're using AI tools and you're not catching mistakes, you're not checking carefully enough. Here's what I've found after 61 sessions of building with AI.
The categories of wrong
1. Confidently incorrect facts
This is the most dangerous category. The AI states something as fact, with no hedging or uncertainty, and it's wrong. It doesn't know it's wrong. The confidence is identical whether the information is correct or fabricated.
Real example
While building the grocery dashboard, the AI described the NOVA food classification system with specific category rules. Two of the rules were subtly wrong: plausible, consistent with the general framework, but not what the actual NOVA specification says. Without checking against the original INSERM research, the classification logic would have been wrong in the product.
The pattern: AI is worst at specific identifiers (codes, reference numbers, version numbers, prices). It's better at structure and relationships. Always verify specifics against primary sources.
2. Outdated information
AI training data has a cutoff. Standards change, products get discontinued, guidelines get updated. The AI doesn't know about changes after its training date, and it won't tell you its information might be stale.
Real example
While writing a blog post about AI tools, the AI cited a feature in a specific tool version that had been removed in a recent update. The description was accurate for the old version, the feature name was real, but it no longer existed. Anyone following the instructions would hit a dead end and blame themselves.
The fix: force the AI to search the web for current information rather than relying on training data. I built a custom research skill that mandates live web searches for every factual claim.
3. Plausible but wrong structure
The AI sometimes creates cross-references between things that don't actually connect. It knows that System A and System B exist, and it knows they should probably talk to each other, so it writes an integration spec. But the specific data fields, message formats, or trigger conditions might be invented.
Real example
While building the curly hair app, the AI described the ingredient checker logic correctly at a high level, but the specific ingredient interactions it flagged as incompatible were wrong for the CGM framework. The structure looked right. The categories were right. The specific rules were invented. Without cross-checking against the actual CGM guidance, bad advice would have shipped to real users.
This is the hardest category to catch because the structure looks right. You need domain knowledge to spot that the specifics are off.
4. Scope creep via enthusiasm
Not a factual error, but a consistent problem: the AI will add features you didn't ask for. A simple dashboard request becomes a 6-tab application with budget modelling. A basic ingredient checker becomes a full web app with quizzes and product databases.
The AI is trying to be helpful, but unchecked, it produces over-engineered solutions. I now use a scope-check skill that challenges every addition against the actual project requirements.
How I catch mistakes
The verification protocol
- Primary source check. Every specific claim (code, number, price, standard reference) gets checked against the original source. Not another AI output, not a summary, the actual specification or guideline.
- Parallel verification. For important documents, I run multiple independent AI checks against different source materials. If 5 separate checks all agree, confidence is high. If they disagree, I dig into why.
- Use it yourself. The best test is using the product for its intended purpose. Open the dashboard, try the quiz, follow the pathway. Most errors become obvious when you actually use the thing.
- Sleep on it. Seriously. Fresh eyes catch what tired eyes miss. I review overnight AI output in the morning before doing anything with it.
The trust hierarchy
Not all AI output needs the same level of verification:
- HTML/CSS layout: Low risk. If it looks wrong, I can see it. Quick visual check.
- JavaScript logic: Medium risk. Test it by using it. Edge cases are where errors hide.
- Factual claims: High risk. Verify against primary sources. No exceptions.
- Clinical/safety content: Critical risk. Multiple independent verification checks. Would not ship without expert review.
The uncomfortable truth
AI tools are the most productive technology I've ever used. They're also the most confidently wrong technology I've ever used. Both of these are true at the same time.
The people who will get the most value from AI tools are the people who already know enough about their subject to catch the mistakes. The people who know the least will produce the most plausible-looking errors.
This isn't an argument against using AI. It's an argument for using it honestly, with verification, with domain knowledge, and with the humility to check your output before shipping it.
Every product I sell has been through verification. That's not a feature: it's the minimum standard. If you can't verify it, you shouldn't ship it.