Mysterious gpt2-chatbot released with GPT-4 level capabilities

Update: The gpt2-chatbot is now unavailable, adding even more fuel to the mysterious model.

This week in AI a mysterious new model called 'gpt2-chatbot' was released on the website LMSYS Chatbot Arena, which is a site for benchmarking LLMs and comparing their results.

The reason this model has been going viral lately is that it was released without any documentation, nobody knows who built it, and most importantly...many are saying it's comparable to GPT-4 in terms of capabilities.

source

Now let's look at a few examples of why people are saying this model is at GPT-4 levels...

Reasoning capabilities

As Pietro Schrano demonstrates, let's test it's reasoning capabilities with the classic reasoning question that AI finds notoriously hard (at least it used to):

What weighs more a kilogram of feathers or a kilogram of lead?

Comparing this to GPT 3.5 Turbo, you can see it still gets it right but in my opinion, the answer is quite a bit worse because it says you "might" need a bigger space for the feathers...

Mathematical abilities

Alright, now let's move on to check out its mathematical abilities as highlighted by Andrew Gao on X:

uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts 🧵 https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG
— Andrew Gao (@itsandrewgao) April 29, 2024

From his experiments, you can see it doesn't always get it right, but the fact that an anonymous model is getting International Math Olympiad questions right is still quite mind-blowing.

GPT2-chatbot speculation

These are just a few examples of the model's capabilities, and many are speculating that OpenAI is behind the model and they released this as a stealth-mode teaser for GPT 4.5.

The other speculation is that it is the GPT-2 base model that's been fine-tuned on modern assistant datasets...

my guess is this mysterious 'gpt2-chatbot' is literally OpenAI's gpt-2 from 2019 finetuned with modern assistant datasets.

in which case that means their original pre-training is still amazing and better than everyone else's 4 years later pic.twitter.com/GPgG1b6QIT
— albs — 3/staccs (@albfresco) April 29, 2024

Lastly, this X user found out that this is the system prompt:

💡

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.

Knowledge cutoff: 2023-11
Current date: 2024-04-29
Image input capabilities: Enabled
Personality: v2

Looks like this is the system prompt:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.

Knowledge cutoff: 2023-11
Current date: 2024-04-29
Image input capabilities: Enabled
Personality: v2https://t.co/jtW4OEZodP
— Simon Willison (@simonw) April 29, 2024

While all this speculation and mystery is interesting, the fact that a model this good can be released, go viral on X within 24 hours, and no one knows who's behind it is quite entertaining...

There is a mysterious new model called gpt2-chatbot accessible from a major LLM benchmarking site. No one knows who made it or what it is, but I have been playing with it a little and it appears to be in the same rough ability level as GPT-4. A mysterious GPT-4 class model? Neat! pic.twitter.com/1s2iEreaiT
— Ethan Mollick (@emollick) April 29, 2024