1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Abigail Staley edited this page 1 week ago


Inclusion of reasoning “chains of thought” (CoT) in the model output significantly improves its quality, but it increases reasoning cost. - Distillation transfers reasoning knowledge from a pricey instructor model to a more economical trainee, decreasing general inference cost. - DeepSeek R1 can produce detailed CoT, making it an outstanding instructor design.

  1. A human professional’s chain of idea.
  2. The final answer.

    We expanded this dataset by adding:

    Synthetic R1 reasoning, raovatonline.org i.e., the CoT generated by DeepSeek R1.

    Then, fraternityofshadows.com we fine-tuned 3 variants of the design (using LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final response without revealing thinking. Human Expert CoT: Generate the final answer alongside a thinking chain looking like the human expert’s. Synthetic R1 CoT: Generate the last response alongside DeepSeek R1’s synthetic reasoning chain. The table listed below summarizes typical accuracy and reasoning length:

    - Note: The accuracy for the 5-shot standard might vary from numbers reported in other places due to various assessment setups. The essential focus is on comparing relative performance throughout distillation approaches, not on beating other models.

    From this study, synthetic reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in boosting efficiency, albeit with a higher inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will quickly belong to FireOptimizer. If you need earlier gain access to, please contact us to check out choices.

    Conclusions

    By including reasoning-based information through distillation, companies can dramatically improve model performance without bearing the complete burden of human-annotated datasets. DeepSeek R1’s ability to produce long, high-quality thinking chains makes it a powerful teacher model-showing that, sometimes, the machine may just out-teach the human.