Efficient CoT for DeepSeek-R1-Distill-Qwen-7B
We Jianshu She, Zhuohao Li, Zhemin Huang and Muqi Li fine-tuned DeepSeek-R1-Distill-Qwen-7B using GRPO (Gradient-Regularized Policy Optimization) to achieve over 75% compression in Chain of Thought (CoT) length on the MATH dataset, with less than 5% accuracy loss.
Results Comparison
| Model | Final Accuracy | Average CoT Length | Average Answer Length |
|---|---|---|---|
| Baseline (Full CoT) | 92.08% | 450.95 words | 481.19 words |
| Efficient_CoT_DeepSeek-R1-Distill-Qwen-7B | 89.11% | 113.06 words | 125.94 words |
Our optimization strategy significantly reduces CoT length while maintaining high accuracy, making inference more efficient. This approach is particularly suitable for resource-constrained environments without sacrificing reasoning performance.
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support