Concise Reasoning in the Lens of Lagrangian Optimization
This model is trained by PALU, a post-training strategy for concise reasoning LLMs. PALU aims to enhance reasoning efficiency by optimizing the trade-off between accuracy and output verbosity.
Paper: https://arxiv.org/abs/2510.10168
Authors: Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, and Zhiqiang Xu.
Key Features:
Concise reasoning: 65% fewer output tokens across five math benchmarks, with full reasoning preserved.
Performance: 15% higher accuracy than alternative solutions.
Efficiency: 20% faster compared with the GRPO training.
License: This repository and the model weights are released under the MIT License. The base model, DeepSeek-R1-Distill-Qwen-1.5B, is also under the MIT License, while its parent models, the Qwen-2.5 series, are licensed under the Apache 2.0 License.
Citation:
@article{gao2025concise,
title={Concise Reasoning in the Lens of Lagrangian Optimization},
author={Gao, Chengqian and Li, Haonan and Killian, Taylor W and She, Jianshu and Wang, Renxi and Ma, Liqun and Cheng, Zhoujun and Hao, Shibo and Xu, Zhiqiang},
journal={arXiv preprint arXiv:2510.10168},
year={2025}
}
- Downloads last month
- 96
Model tree for glorgao/PALU-1.5B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B