qingfei1
/

R-Search-3b-ppo

Model card Files Files and versions

R-Search-3b-ppo / README.md

qingfei1's picture

Upload folder using huggingface_hub

0e27a3c verified 7 months ago

|

1.54 kB

	---
	language:
	- en
	datasets:
	- qingfei1/R-Search_datasets
	license: apache-2.0
	---
	# R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

	<p align="center">
	🤗 <a href="https://huggingface.co/datasets/qingfei1/R-Search_datasets" target="_blank">[R-Search Datasets] </a> • 💻 <a href="https://github.com/QingFei1/R-Search" target="_blank">[Github Repo]</a>
	</p>

	R-Search is a novel reinforcement learning framework for reasoning–search integration. It enables LLMs to autonomously perform multi-step reasoning with deep search interaction, and to learn optimal reasoning–search trajectories via multi-reward signals, substantially improving performance on complex logic- and knowledge-intensive tasks.

	## Trained Models

	We open-sourced the following models trained only on the 2wikimultihopqa training set:

	\|Model\|Huggingface Repo\|Description\|
	\|---\|---\|---\|
	\|R-Search-7b-grpo\| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-grpo) \| Trained Qwen2.5-7B-Instruct using the GRPO algorithm \|
	\|R-Search-3b-grpo\| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-grpo) \| Trained Qwen2.5-3B-Instruct using the GRPO algorithm \|
	\|R-Search-7b-ppo\| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-7b-ppo) \| Trained Qwen2.5-7B-Instruct using the PPO algorithm \|
	\|R-Search-3b-ppo\| [🤗 Huggingface Repo](https://huggingface.co/qingfei1/R-Search-3b-ppo) \| Trained Qwen2.5-3B-Instruct using the PPO algorithm \|