Update: The gpt2-chatbot is now unavailable, adding even more fuel to the mysterious model.
This week in AI a mysterious new model called 'gpt2-chatbot' was released on the website LMSYS Chatbot Arena, which is a site for benchmarking LLMs and comparing their results.
The reason this model has been going viral lately is that it was released without any documentation, nobody knows who built it, and most importantly...many are saying it's comparable to GPT-4 in terms of capabilities.
Now let's look at a few examples of why people are saying this model is at GPT-4 levels...
Reasoning capabilities
As Pietro Schrano demonstrates, let's test it's reasoning capabilities with the classic reasoning question that AI finds notoriously hard (at least it used to):
What weighs more a kilogram of feathers or a kilogram of lead?
Comparing this to GPT 3.5 Turbo, you can see it still gets it right but in my opinion, the answer is quite a bit worse because it says you "might" need a bigger space for the feathers...
Mathematical abilities
Alright, now let's move on to check out its mathematical abilities as highlighted by Andrew Gao on X:
From his experiments, you can see it doesn't always get it right, but the fact that an anonymous model is getting International Math Olympiad questions right is still quite mind-blowing.
GPT2-chatbot speculation
These are just a few examples of the model's capabilities, and many are speculating that OpenAI is behind the model and they released this as a stealth-mode teaser for GPT 4.5.
The other speculation is that it is the GPT-2 base model that's been fine-tuned on modern assistant datasets...
Lastly, this X user found out that this is the system prompt:
Knowledge cutoff: 2023-11
Current date: 2024-04-29
Image input capabilities: Enabled
Personality: v2
While all this speculation and mystery is interesting, the fact that a model this good can be released, go viral on X within 24 hours, and no one knows who's behind it is quite entertaining...