The Mystery
We Don't Fully Understand Why It Works
Here's a confession that might surprise you: the people who build these systems don't fully understand how they work. Not in the sense of "we don't know the math"— they know the architecture perfectly. But in the deeper sense of "why does this produce intelligence?"
That's still a mystery.
Emergence: Abilities Nobody Programmed
Emergent capabilities are perhaps the most surprising phenomenon in AI. These are abilities that appear in larger models but weren't explicitly trained for—they just show up.
Examples of Emergence
Chain-of-thought reasoning
Larger models discovered they could solve complex problems by "thinking step by step"— even though no one told them to do this.
In-context learning
Show a model a few examples of a task, and it can do that task—without any training. This ability emerged at scale.
Code generation from descriptions
Models trained to predict text learned to write functional code from natural language descriptions. No special programming training required.
This makes it hard to predict what capabilities future models will have. And it raises the question: what other capabilities might emerge that we haven't discovered yet—or haven't thought to test for?
The Interpretability Challenge
Modern LLMs have billions of parameters—numerical values that together encode everything the model has learned. But we can't simply read those numbers and understand what the model "knows."
The Scale of Complexity
GPT-4 has an estimated 1.7 trillion parameters. That's roughly:
- • 200x the number of neurons in a human brain
- • 10x the number of connections in a mouse brain
- • More than the number of stars in 10,000 Milky Way galaxies
Researchers are working on "interpretability"—techniques to understand what's happening inside these models. They've made progress:
- Found individual neurons that detect specific concepts
- Identified circuits that perform particular operations
- Visualized attention patterns showing what the model focuses on
But we're still far from a complete understanding. We can probe specific behaviors, but we can't fully explain why a model gives a particular answer to a particular question.
What We Don't Know
Be skeptical of anyone who claims to fully understand LLMs. Here are genuine open questions:
Why does scale work?
We know larger models are more capable, but we don't have a deep theoretical understanding of why.
What are the limits?
Will scaling continue to produce improvements? Are there fundamental limits? Nobody knows.
Is it "understanding"?
Philosophers and researchers genuinely disagree about whether LLMs understand anything in a meaningful sense.
What will emerge next?
Emergent capabilities are hard to predict. We don't know what larger models will be able to do.
The Hallucination Problem
One of the clearest symptoms of our incomplete understanding is hallucination— when models generate plausible-sounding but completely false information.
Example Hallucination
User: Who wrote the paper "Deep Learning for Natural Language Understanding"?
Model: "This seminal 2019 paper was authored by Dr. James Chen and Dr. Sarah Mitchell at Stanford University, published in Nature Machine Intelligence..."
This paper, these authors, and these details don't exist. The model made them up.
Why does this happen? LLMs are trained to produce plausible text, not to verify facts. They don't have a separate "fact-checking" module. When they don't know something, they don't say "I don't know"—they generate what would be a reasonable answer if they did know.
Why This Matters
Our incomplete understanding has real implications:
Safety Concerns
If we don't fully understand how these systems work, it's hard to guarantee they'll behave safely. Emergent capabilities could include harmful ones we haven't anticipated.
Trust and Verification
When we can't explain why a model gives a particular answer, how do we know when to trust it? This matters in medicine, law, and other high-stakes domains.
Improving Systematically
Without deep understanding, progress relies partly on trial and error. True understanding would allow more targeted improvements.
The Search for Understanding
Despite the challenges, researchers are making progress:
- Mechanistic interpretability: Reverse-engineering what circuits in the model do
- Scaling laws: Mathematical relationships between model size and capability
- Probing studies: Testing what information is encoded where
- Behavioral experiments: Systematic testing to characterize capabilities and limitations
Understanding may come. But for now, we're in a remarkable position: using powerful tools we built but don't fully understand.
Key Takeaways
- Even creators don't fully understand why LLMs work the way they do
- Emergent capabilities appear unpredictably at scale
- Interpretability research is progressing but far from complete
- Hallucinations reveal fundamental differences from human knowledge
- This uncertainty has real implications for safety and trust