R-Search-3b-ppo / README.md
qingfei1's picture
Upload folder using huggingface_hub
0e27a3c verified
|
raw
history blame
1.54 kB
---
language:
- en
datasets:
- qingfei1/R-Search_datasets
license: apache-2.0
---
# R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
<p align="center">
🤗 <a href="https://huggingface.co/datasets/qingfei1/R-Search_datasets" target="_blank">[R-Search Datasets] </a> • 💻 <a href="https://github.com/QingFei1/R-Search" target="_blank">[Github Repo]</a>
</p>
**R-Search** is a novel reinforcement learning framework for reasoning–search integration. It enables LLMs to autonomously perform multi-step reasoning with deep search interaction, and to learn optimal reasoning–search trajectories via multi-reward signals, substantially improving performance on complex logic- and knowledge-intensive tasks.
## Trained Models
We open-sourced the following models trained only on the 2wikimultihopqa training set:
|Model|Huggingface Repo|Description|
|---|---|---|
|**R-Search-7b-grpo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-grpo) | Trained **Qwen2.5-7B-Instruct** using the GRPO algorithm |
|**R-Search-3b-grpo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-grpo) | Trained **Qwen2.5-3B-Instruct** using the GRPO algorithm |
|**R-Search-7b-ppo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-ppo) | Trained **Qwen2.5-7B-Instruct** using the PPO algorithm |
|**R-Search-3b-ppo**| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-ppo) | Trained **Qwen2.5-3B-Instruct** using the PPO algorithm |