All The Models Will Get Better

Something I meant to write about but haven’t had time

All of the models that don’t perform as well as R1 will get a bump soon.

The reason is simple: they can all learn from R1. It’s cheap, easy, and legal.

It really is learning from OpenAI’s o1 model in a lot of ways: it is widely believed that R1 distilled at least some knowledge from o1, now anyone can distill R1 for their own model.

Did DeepSeek Plunder R1?

https://twimlai.com/podcast/twimlai/inside-s1-an-o1-style-reasoning-model-that-cost-under-50-to-train/I haven’t seen any evidence, but it looks like it wouldn’t take much. A fascinating paper out of Stanford showed that they were able to take an existing model and to post-training to get it to “reason” like o1 for about $50 of GPU credits. (Listen to the podcast that goes over it here). They learned that with as little as 1,000 samples was enough.

1,000!

That’s nothing! That was 20 minutes of training!

What this means is that any model that underperforms R1 can do post-training on an existing model and see an immediate bump in performance. Llama, Mistral, Qwen, you name it. Call it regression towards the best: over time all AI models will gain performance that approaches the best model publicly available^{1, 2}.

1. You might hope that this means open-weight models, but if it only takes 1,000 samples anyone can — and will — be able to grab enough from a proprietary model before getting locked out to conduct some post-training.

2. This doesn’t mean that all models will see such a jump immediately: there are still models that are trying to get smaller before they get better, but even those models are likely to see an increase in performance over time from their ability to distill upstream models.

0 Comments

What's your $0.02?Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

All The Models Will Get Better

Did DeepSeek Plunder R1?

Related

0 Comments

What's your $0.02?Cancel reply

Recent Posts

All The Models Will Get Better

Did DeepSeek Plunder R1?

Related

0 Comments

What's your $0.02?Cancel reply

Recent Posts

Most Read