Papers
arxiv:2511.09780

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Published on Nov 12
· Submitted by Oguzhan Ersoy on Nov 14
Authors:
,
,

Abstract

The study identifies and defends against adversarial attacks in decentralized Group Relative Policy Optimization (GRPO) for Large Language Models (LLMs), demonstrating attack success rates of up to 100% and proposing effective defense mechanisms.

AI-generated summary

Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinforcement learning, preferred completions are learnt. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and then exchanged in the forms of strings. In this work, we present the first adversarial attack in decentralised GRPO. We demonstrate that malicious parties can poison such systems by injecting arbitrary malicious tokens in benign models in both out-of-context and in-context attacks. Using empirical examples of math and coding tasks, we show that adversarial attacks can easily poison the benign nodes, polluting their local LLM post-training, achieving attack success rates up to 100% in as few as 50 iterations. We propose two ways to defend against these attacks, depending on whether all users train the same model or different models. We show that these defenses can achieve stop rates of up to 100%, making the attack impossible.

Community

Paper submitter

"Hail to the Thief: Exploring Attacks and Defenses in Decentralized GRPO" is the first systematic study exploring both attack vectors and defense strategies in decentralised reinforcement learning for Large Language Models (LLMs). We demonstrate how adversarial completions can corrupt RL training, causing honest models to produce arbitrary tokens during inference in as few as 20 iterations. We then propose effective, lightweight defenses that make these systems robust and trustworthy.

Our work provides the first blueprint for achieving robust decentralised reinforcement learning for LLMs.

Why It Matters

Reinforcement learning (RL) has become the key to aligning LLMs with human intent, reasoning, and formatting. Due to the low communication required by GRPO (only completions, instead of the large gradients typically needed in training), it is particularly well-suited for decentralised reinforcement learning (dRL). Since dRL involves collaborative learning among participants who may not be known or trusted, attack risks emerge. A malicious participant can compromise others' models, embedding hidden behaviors or spreading subtle biases that others unknowingly learn. Mitigating these risks requires robust dRL mechanisms.

This is incredibly important research for the decentralized AI ecosystem!
Huge respect to the authors for uncovering how decentralized GRPO can be exploited — and more importantly, how it can be defended.

As decentralized training scales up, work like this helps the entire community understand real attack surfaces and practical mitigation strategies. The fact that the paper not only demonstrates the vulnerabilities but also provides defenses capable of fully stopping the attacks is a massive contribution.

Research like this strengthens trust, improves system robustness, and pushes decentralized LLM training closer to real-world readiness. 🙌🔥

Kudos to the authors and to Gensyn for continuously pushing the boundaries of security in distributed LLM training. Excited to see what comes next!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.09780 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.09780 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.09780 in a Space README.md to link it from this page.

Collections including this paper 1