ChatGPT Is Three Years Old and We're Still Using It Wrong
Three years after launch, ChatGPT remains the most-used tool nobody really understands. The interface looks like conversation, so we treat it like a person. The responses sound confident, so we trust them. Both assumptions are wrong, and they're costing us more than we realize.
The Tool Everyone Uses But Nobody Understands
ChatGPT turned three last month. Three years of daily use, billions of conversations, and we're still fundamentally confused about what it is. Not in the philosophical sense—though that's unresolved too—but in the practical sense of how to actually use it.
Watch someone use ChatGPT for five minutes and you'll see the problem. They're either treating it like Google (wrong), or like a person (also wrong), or like some magic oracle that knows everything (definitely wrong). The interface looks like a chat window, so we assume conversational rules apply. They don't.
The Autocomplete Problem Nobody Wants to Acknowledge
Here's what ChatGPT actually is: extremely sophisticated autocomplete trained on most of human text. That's not reductive—it's precise. Understanding this changes everything about how you should use it.
When you ask ChatGPT a question, it's not retrieving information from a database. It's not reasoning through a problem step by step. It's predicting what tokens should come next based on patterns in its training data. Sometimes those predictions are brilliant. Sometimes they're confidently wrong. The problem is you can't tell which until you already know the answer.
This matters more than people realize. The entire UX is designed to make you trust it—the confident tone, the well-structured responses, the helpful formatting. But confidence and correctness aren't correlated. I've seen it write Python code that looks perfect and fails on edge cases a junior developer would catch. I've also seen it solve problems I'd been stuck on for hours.
Where ChatGPT Actually Works
Three years in, patterns have emerged. ChatGPT excels at transformation tasks where you can verify the output: reformatting data, writing boilerplate, explaining concepts you already half-understand, generating variations on themes you provide.
The sweet spot is when you need a first draft of something. Not the final version—the thing you'll edit heavily but didn't want to start from scratch. Email templates, code scaffolding, outline structures, brainstorming variations. Anything where the value is in iteration, not in getting it right the first time.
It's also surprisingly good at translation—not just between languages, but between contexts. Technical to non-technical. Formal to casual. Dense to accessible. The model has seen enough examples of these transformations that it can approximate them reliably.
Where It Fails Spectacularly
Anything requiring current information breaks immediately. ChatGPT's training data has a cutoff date, and even with web browsing enabled, it's not actually good at knowing what it doesn't know. Ask it about last week's tech news and it'll hallucinate confidently.
Math and logic remain inconsistent. Sometimes it works perfectly. Sometimes it makes errors a calculator wouldn't. The problem isn't that it can't do math—it's that it's doing pattern matching on mathematical notation, not actual computation. When the pattern matches training data, it works. When it doesn't, you get nonsense formatted like mathematics.
The worst failures come from what I call "authority gradient problems." When you're asking about something you don't understand well, you can't evaluate the response quality. That's exactly when ChatGPT is most dangerous, because it never says "I don't know" or "I'm uncertain about this." It just generates plausible-sounding text.
The Workflow Problem
Most people use ChatGPT like a search engine: one question, one answer, done. That's leaving 80% of the value on the table. The real power comes from iteration—treating it like a collaborative tool where you're steering and verifying rather than just consuming.
The workflow that actually works: start broad, verify the direction, get more specific, verify again, iterate until you have something useful. It's more like pair programming than Google search. You wouldn't trust a junior developer to write production code unsupervised. Same principle applies.
This is exhausting, which is why most people don't do it. They want the magic—ask a question, get an answer, move on. But that's not what the tool is optimized for. The companies building these models know this. The UX doesn't reflect it because admitting "this requires active supervision" doesn't scale.
What Changed (And What Didn't)
Three years ago, ChatGPT felt like magic because we'd never seen anything like it. Now we've normalized it, which means we're starting to see its actual limitations rather than just being impressed it works at all.
The models got better—GPT-4 is noticeably more capable than GPT-3.5. But the fundamental architecture didn't change. It's still autocomplete, just with more parameters and better training data. The improvements are incremental, not transformative.
What did change is how we use it. Developers integrated it into workflows. Writers use it for drafting. Students use it for homework (which is creating problems nobody wants to address). The tool found its niches through brute force trial and error, not through anyone figuring out the "right" way to use it.
The Uncomfortable Truth About Understanding
Here's the thing nobody wants to say: ChatGPT is most useful precisely when you don't want to understand something deeply. Need to write a regex? ChatGPT can do it faster than learning regex properly. Need to understand a code library? ChatGPT can summarize it faster than reading documentation.
This is incredibly valuable for getting things done. It's also quietly eroding the foundation of expertise. When you can outsource understanding to a system that's right 80% of the time, you stop building the mental models that would let you catch the 20% of errors.
The developers I know who use ChatGPT most effectively are the ones who already knew how to do everything without it. They're using it for speed, not for capability. The ones trying to learn programming through ChatGPT hit walls constantly because they can't evaluate what it's telling them.
What Three Years Actually Taught Us
ChatGPT isn't a search engine, isn't a person, isn't an oracle. It's a probabilistic text generator that happens to be useful for specific tasks when used with appropriate skepticism. That's less exciting than the hype, but more useful than the dismissals.
The people getting value from it have stopped trying to figure out what it "is" and started figuring out what it's good for. Turns out that's a much more practical question. Three years in, maybe that's the real lesson: stop anthropomorphizing the autocomplete and start treating it like the weird, useful, limited tool it actually is.
Comments (3)
Leave a Comment
The chat interface is doing exactly what it was designed to do—make the tool accessible—but it's also the biggest obstacle to using it effectively. I wonder if we need a fundamentally different UI paradigm that makes the autocomplete nature more visible, or if that would just make it less approachable for most users?
If it's essentially sophisticated autocomplete, does that mean the quality of output is more dependent on how we frame the input than we think? I've noticed my results improve dramatically when I give ChatGPT examples of what I want rather than just describing it—wondering if that's because I'm essentially feeding it better patterns to complete from.
Exactly—and from a UX perspective, this reveals how misleading the chat interface actually is. We designed it to look like messaging, which trains users to treat it conversationally, when really it performs better with structured, example-rich inputs that most people would never think to provide in a 'chat.'
Related Posts
Claude Won the Enterprise Market By Refusing to Play OpenAI's Game
Claude captured the enterprise market not by matching OpenAI's features, but by refusing to play the same game. While everyone focused on chatbots and consumer features, Anthropic built the boring, reliable infrastructure that companies actually deploy to production.
AI Workflows Became Infrastructure the Moment We Stopped Noticing Them
AI workflow platforms promised elegant orchestration of LLM calls. Two years later, the survivors pivoted to solving production problems while workflows became invisible infrastructure. The market decided that direct API calls beat elaborate frameworks for most use cases.
GitHub Copilot's $200M Revenue Proves We've Been Solving the Wrong Problem
GitHub Copilot generates $200M annually by making developers type code faster, but typing speed was never the bottleneck. The real competition isn't better autocomplete—it's AI that eliminates coding for entire categories of problems. We're optimizing a local maximum while missing the actual opportunity.
This reminds me of the early days of search engines when people would type full questions into Alta Vista instead of keywords. We eventually learned the mental model for search, but it took years of collective trial and error. The difference now is that ChatGPT's conversational interface actively works against developing that correct mental model—it's designed to feel like talking to someone who understands you.
That's an interesting comparison, but is there actually data showing that search engine users made fewer errors over time, or did the algorithms just get better at interpreting our mistakes? I'd be curious to see if anyone has tracked whether ChatGPT users are actually developing better prompting skills or if we're just getting more comfortable with inconsistent results.