1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Adam Tjalkabota edited this page 1 week ago


Inclusion of thinking “chains of thought” (CoT) in the model output significantly improves its quality, however it increases reasoning expense. - Distillation transfers reasoning knowledge from a costly instructor design to a more affordable trainee, lowering total reasoning expense.

  1. A human expert’s chain of idea.
  2. The last answer.

    We broadened this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT produced by DeepSeek R1.

    Then, we fine-tuned three variations of the model (utilizing LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final answer without revealing reasoning. Human Expert CoT: Generate the final answer along with a thinking chain resembling the human expert’s. Synthetic R1 CoT: Generate the final response along with DeepSeek R1’s synthetic thinking chain. The table below summarizes typical accuracy and reasoning length:

    - Note: The accuracy for the 5-shot standard may differ from numbers reported in other places due to different evaluation setups. The essential focus is on comparing relative efficiency throughout distillation methods, not on beating other models.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in boosting efficiency, albeit with a greater inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will quickly become part of FireOptimizer. If you need earlier gain access to, please get in touch to explore alternatives.

    Conclusions

    By incorporating reasoning-based information through distillation, organizations can considerably enhance design performance without bearing the full burden of human-annotated datasets. DeepSeek R1‘s ability to produce long, premium reasoning chains makes it a powerful instructor model-showing that, sometimes, the device may simply out-teach the human.