Efficient CoT for DeepSeek-R1-Distill-Qwen-7B

We Jianshu She, Zhuohao Li, Zhemin Huang and Muqi Li fine-tuned DeepSeek-R1-Distill-Qwen-7B using GRPO (Gradient-Regularized Policy Optimization) to achieve over 75% compression in Chain of Thought (CoT) length on the MATH dataset, with less than 5% accuracy loss.

Results Comparison

Model	Final Accuracy	Average CoT Length	Average Answer Length
Baseline (Full CoT)	92.08%	450.95 words	481.19 words
Efficient_CoT_DeepSeek-R1-Distill-Qwen-7B	89.11%	113.06 words	125.94 words

Our optimization strategy significantly reduces CoT length while maintaining high accuracy, making inference more efficient. This approach is particularly suitable for resource-constrained environments without sacrificing reasoning performance.

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jianshu001/Efficient_CoT_DeepSeek-R1-Distill-Qwen-7B

Base model

deepseek-ai/DeepSeek-R1

Finetuned

(313)

this model

Quantizations

2 models