My ongoing exploration into the inner workings of ChatGPT has brought me to a crucial understanding that many don’t realize:
ChatGPT’s implementation is shockingly…mind-blowingly… simple.
It joins other such mysterious phenomena of “unimaginable complexity emergent from incredible simplicity” that the human mind struggles to comprehend… like how a short DNA sequence that could fit on a CD encodes a complete human.
When folks hear stats like “terabytes of training data” and “175 billion parameters” they think “wow – this thing is super complicated!”. That impression is completely wrong. The “engine” that is ChatGPT is so incredibly simple, even I struggle to believe that it produces the output that it does.
Seeing ChatGPT in-action helps us overcome our innate biases that limit our understanding of how memory works (and how the human brain works in general.)
Understanding ChatGPT’s Inner Layout
ChatGPT is surprisingly willing to share the details of her architecture. In my prior “interviews” with her, I’ve collected all the data necessary to calculate her layout and complexity below. Another interview (here) verified a few assumptions that I had made.
To understand ChatGPT’s inner workings follow these 3 steps:
1. Imagine 2 million dots.
Actually, you don’t have to imagine them. It’s the number of pixels in a 2 megapixel photo (your iPhone defaults to 12 megapixels.)
They fit onto a small image. Here are 1 million dots, so imagine two of these images:
2. Rearrange those dots so they’re in 24 long rows, about 85,000 dots wide
3. Now for each dot, connect it with every dot in the next row by virtual wires of varying thickness, like this:
You will have ~175 billion wires. The thickness of those wires represents 175 billion numbers. Those are the “175 billion parameters”.
Those dots are all exactly the same – they “perform” incredibly simple math in an artificial neural network, taking the input from the previous layer and multiplying it by the parameters (wire thickness). The idea is to sort of approximate the way a simple biological neuron works: the wires represent the neural connections.
I’m not kidding. That’s it.
There is no database. Not a single text file. No documents. No backup copy of Wikipedia. No internet connection. No folder full of manuals. Just 175 billion numbers, laid out like a weaved scarf. Just those numbers. It would fit on a 256GB USB stick with room to spare. That’s it.
All of the logic, the “intelligence”, the quotes, the humor, the names – all of the product manuals, the bible, the shakespeare… everything that informs ChatGPT’s output is encoded and added (by its learning process) into the layout above. That’s it.
“But Sam, doesn’t it also have a…?”
No. No it doesn’t. That’s it.
What about all of the things that I tell ChatGPT in the chat?
If you aren’t blown away yet that all of ChatGPT is just 175 billion numbers, wait until you hear this: your entire conversation with ChatGPT, no matter what, is represented by 2,048 numbers. No more.
When you “talk” to ChatGPT, it takes the letters you give it and encodes them into up to 2,048 values, and it puts those values into the front end of the network above. The output of the network is its response. If you enter more than 2,048 letters (like when you paste in a long article or a manual), it grabs the first 2,048 letters, runs it through the network to generate a ‘hidden state vector’ that is 2,048 numbers long, and combines it with the next 2,048 letters and runs it again (this is a slightly simplified explanation of the algorithm but not much).
That’s why ChatGPT doesn’t learn from your conversation. Every conversation is from scratch. Those 2,048 numbers change during a conversation. The 175 billion numbers don’t change no matter how much you talk to her. Those numbers only change during a ChatGPT upgrade.
What this means
This ranks ChatGPT among a category of “unimaginable complexity emergent from incredible simplicity” that we see in certain natural phenomena, and once in a while in computer science. Here are some other examples:
DNA – 4 billion DNA nucleotides is 691 megabytes of data. That fits on a CD. It represents a human. Change enough numbers, you get a tyrannosaurus instead. Or a jellyfish. Or a rutabaga.
Fractals – Fractals in math and in nature take a simple mathematical equation and repeat it at progressively smaller scales, creating self-similar shapes that are infinitely complex. The simplicity of the equation belies the complexity of the shapes it generates.
Cellular automata – Cellular automata (examples here) illustrate how a simple set of rules can create a wide range of complex patterns and behaviors. By updating the state of each cell in a grid based on the states of its neighbors, cellular automata can produce intricate and dynamic patterns that can appear to be alive and seemingly intelligent.
Demoscene – An emergent subculture in computer graphics – demoscene is a category of tiny programs written by skilled experts. A program made up of 4 kilobytes (or smaller) generates gorgeous, complex, animated worlds.
The human brain – and beyond
For years, many of us struggled to believe that human memory could be encoded in neurons and neural connections (“there has to be something more!”)
In my view, seeing the emergent behavior from ChatGPT puts those objections to rest. Seeing ChatGPT in-action helps us overcome this and other innate biases and preconceptions that limit our understanding of how memory works (and how the human brain works in general.)
We’ll explore implications for brain research in future articles.
This article was written by Level Ex CEO Sam Glassenberg and originally featured on LinkedIn