Did you ever think that creating your own artificial intelligence model is reserved only for corporations with billion-dollar budgets? I thought so, too. Until I came across Andrej Karpathy's tutorial and decided to test it out for myself.
The result? I created a working language model, and the cost of "training" it cost me only $15.
Does this mean that for the price of a pizza I created a competitor to ChatGPT that I can implement in the company? Absolutely not. But this experience opened my eyes to how things work "under the hood."
GPT-2: Museum exhibit or great teacher?
The model I built (GPT-2 version 124M) is from 2019. In the world of AI, this is prehistory. To be honest: this model is not suitable for today's business. It won't write a flashy sales email, it won't analyze a complex report. Its capabilities are very limited compared to what we have on our phones today.
So why did I do it at all? Because the process of creating AI itself has hardly changed.
The mechanics - how the model "learns" the language - is strikingly similar to what happens when creating the latest giants. Understanding GPT-2 is like taking apart the engine of an old Fiat. You may not win Formula One with it, but you'll learn exactly how the internal combustion engine that also powers Ferrari's cars works.
Why exactly this model?
You may ask: "Why didn't you train yourself a GPT-4 at home?". Here we come to the key difference. OpenAI, with the GPT-2 model, made its "design plans" - training parameters and weights - available to the world. This allowed us to do this exercise.
Unfortunately, with the newer models (GPT-3, GPT-4 and newer), that door has been closed. We don't know their exact weights, it's a closely guarded trade secret.
An abyss we can't jump over at home
My $15 experiment also showed me the enormity of the scale of today's AI revolution. While the "recipe" for creating a model is similar, the ingredients needed for market success are astronomically expensive today:
-
Quantity of Data: I used a fraction of what modern models "read." They need petabytes of text.
-
Quality of Data: Although I used modern, scrubbed collections (so my model learned faster than the original from 2019), preparing the data for GPT-4 type models is an engineering masterpiece.
-
Infrastructure: $15 was enough to rent one graphics card in the cloud for a while. Training the models we use every day costs tens or even hundreds of millions of dollars and requires massive server rooms.
What are the implications of this?
This project is a fascinating lesson in humility and understanding technology. It shows that the "magic" of AI is actually math, data access and.... electricity.
Today, any enthusiast can touch this technology for a pittance and understand its foundations. But at the same time, it is clear that the race to create the smartest model is a game with budgets at stake that are unattainable for mere mortals. Nevertheless - it's worth knowing how it works.
Entry inspired by Andrej Karpathy's tutorial: Let's reproduce GPT-2 (124M)