Efficient CoT for DeepSeek-R1-Distill-Qwen-7B

We Jianshu She, Zhuohao Li, Zhemin Huang and Muqi Li fine-tuned DeepSeek-R1-Distill-Qwen-7B using GRPO (Gradient-Regularized Policy Optimization) to achieve over 75% compression in Chain of Thought (CoT) length on the MATH dataset, with less than 5% accuracy loss.

Results Comparison

Model Final Accuracy Average CoT Length Average Answer Length
Baseline (Full CoT) 92.08% 450.95 words 481.19 words
Efficient_CoT_DeepSeek-R1-Distill-Qwen-7B 89.11% 113.06 words 125.94 words

Our optimization strategy significantly reduces CoT length while maintaining high accuracy, making inference more efficient. This approach is particularly suitable for resource-constrained environments without sacrificing reasoning performance.

Downloads last month
5
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jianshu001/Efficient_CoT_DeepSeek-R1-Distill-Qwen-7B

Finetuned
(313)
this model
Quantizations
2 models