Something I meant to write about haven’t had time yet.
All of the models that don’t perform as well as R1 will get a bump soon.
The reason is simple: they can all learn from R1. It’s cheap, easy, and legal.
It really is learning from OpenAI’s o1 model in a lot of ways: it is widely believed that R1 distilled at least some knowledge from o1, now anyone can distill R1 for their own model.
Did DeepSeek Plunder R1?
https://twimlai.com/podcast/twimlai/inside-s1-an-o1-style-reasoning-model-that-cost-under-50-to-train/I haven’t seen any evidence, but it looks like it wouldn’t take much. A fascinating paper out of Stanford showed that they were able to take an existing model and to post=training to get it to “reason” like o1 for about $50 of GPU credits. (Listen to the podcast that goes over it here). They learned that with as little as 1,000 samples was enough.
1,000!
That’s nothing! That was 20 minutes of training!
What this means is that any model that underperforms R1 can do post-training on an existing model and see an immediate bump in performance. Llama, Mistral, Qwen, you name it. Call it regression torwards the best: over time all AI models will will gain performance that approaches the best model publically available1, 2.
1. You might hope that this means open-weight models, but if it only takes 1,000 samples anyone can — and will — be able to grab enough from a propiertary model before getting locked out to conduct some post-training.
2. This doesn’t mean that all models will see such a jump immedietely: there are still models that are trying to get smaller before they get better, but even those models are likely to see an increase in performance over time from their ability to distill upstream models.
0 Comments