--- language: - en datasets: - qingfei1/R-Search_datasets license: apache-2.0 --- # R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

🤗 [R-Search Datasets] • 💻 [Github Repo]

**R-Search** is a novel reinforcement learning framework for reasoning–search integration. It enables LLMs to autonomously perform multi-step reasoning with deep search interaction, and to learn optimal reasoning–search trajectories via multi-reward signals, substantially improving performance on complex logic- and knowledge-intensive tasks. ## Trained Models We open-sourced the following models trained only on the 2wikimultihopqa training set: |Model|Huggingface Repo|Description| |---|---|---| |**R-Search-7b-grpo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-grpo) | Trained **Qwen2.5-7B-Instruct** using the GRPO algorithm | |**R-Search-3b-grpo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-grpo) | Trained **Qwen2.5-3B-Instruct** using the GRPO algorithm | |**R-Search-7b-ppo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-ppo) | Trained **Qwen2.5-7B-Instruct** using the PPO algorithm | |**R-Search-3b-ppo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-ppo) | Trained **Qwen2.5-3B-Instruct** using the PPO algorithm |