Concise Reasoning in the Lens of Lagrangian Optimization

This model is trained by PALU, a post-training strategy for concise reasoning LLMs. PALU aims to enhance reasoning efficiency by optimizing the trade-off between accuracy and output verbosity.

Paper: https://arxiv.org/abs/2510.10168

Authors: Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, and Zhiqiang Xu.

Key Features:

  • Concise reasoning: 65% fewer output tokens across five math benchmarks, with full reasoning preserved.

  • Performance: 15% higher accuracy than alternative solutions.

  • Efficiency: 20% faster compared with the GRPO training.

License: This repository and the model weights are released under the MIT License. The base model, DeepSeek-R1-Distill-Qwen-1.5B, is also under the MIT License, while its parent models, the Qwen-2.5 series, are licensed under the Apache 2.0 License.

Citation:

@article{gao2025concise,
  title={Concise Reasoning in the Lens of Lagrangian Optimization},
  author={Gao, Chengqian and Li, Haonan and Killian, Taylor W and She, Jianshu and Wang, Renxi and Ma, Liqun and Cheng, Zhoujun and Hao, Shibo and Xu, Zhiqiang},
  journal={arXiv preprint arXiv:2510.10168},
  year={2025}
}
Downloads last month
96
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glorgao/PALU-1.5B

Finetuned
(519)
this model

Datasets used to train glorgao/PALU-1.5B