Papers
arxiv:2510.15414

MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games

Published on Oct 17
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

MARS, an end-to-end reinforcement learning framework, enhances multi-agent reasoning in Large Language Models (LLMs) through self-play, improving performance in both games and reasoning benchmarks.

AI-generated summary

Developing Large Language Models (LLMs) to cooperate and compete effectively within multi-agent systems is a critical step towards more advanced intelligence. While reinforcement learning (RL) has proven effective for enhancing reasoning in single-agent tasks, its extension to multi-turn, multi-agent scenarios remains underexplored due to the challenges of long-horizon credit assignment and agent-specific advantage estimation. To address these challenges, we introduce MARS, an end-to-end RL framework that incentivizes Multi-Agent Reasoning of LLMs through Self-play in both cooperative and competitive games. MARS features a turn-level advantage estimator that aligns learning signals with each interaction for credit assignment, and an agent-specific advantage normalization to stabilize multi-agent training. By learning with self-play across cooperative and competitive games, the MARS agent trained from Qwen3-4B develops strong strategic abilities that generalize to held-out games with up to 28.7% performance improvements. More importantly, the capability acquired through self-play generalizes beyond games, yielding consistent performance gains of multi-agent systems in reasoning benchmarks. When integrated into leading multi-agent systems, our MARS agent achieves significant performance gains of 10.0% on AIME and 12.5% on GPQA-Diamond. These results establish end-to-end RL training with self-play in strategic games as a powerful approach for developing generalizable multi-agent reasoning capabilities in LLMs. Our code and models are publicly available at https://github.com/thu-nics/MARS.

Community

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.15414 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.15414 in a Space README.md to link it from this page.

Collections including this paper 2