Concise Reasoning in the Lens of Lagrangian Optimization

This model is trained by PALU, a post-training strategy for concise reasoning LLMs. PALU aims to enhance reasoning efficiency by optimizing the trade-off between accuracy and output verbosity.

Paper: https://arxiv.org/abs/2510.10168

Authors: Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, and Zhiqiang Xu.

Key Features:

Concise reasoning: 65% fewer output tokens across five math benchmarks, with full reasoning preserved.
Performance: 15% higher accuracy than alternative solutions.
Efficiency: 20% faster compared with the GRPO training.

License: This repository and the model weights are released under the MIT License. The base model, DeepSeek-R1-Distill-Qwen-1.5B, is also under the MIT License, while its parent models, the Qwen-2.5 series, are licensed under the Apache 2.0 License.

Citation:

@article{gao2025concise,
  title={Concise Reasoning in the Lens of Lagrangian Optimization},
  author={Gao, Chengqian and Li, Haonan and Killian, Taylor W and She, Jianshu and Wang, Renxi and Ma, Liqun and Cheng, Zhoujun and Hao, Shibo and Xu, Zhiqiang},
  journal={arXiv preprint arXiv:2510.10168},
  year={2025}
}

Downloads last month: 96

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for glorgao/PALU-1.5B

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Finetuned

(519)

this model

glorgao
/

PALU-1.5B

Concise Reasoning in the Lens of Lagrangian Optimization

Model tree for glorgao/PALU-1.5B

Datasets used to train glorgao/PALU-1.5B