Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 57
Hybrid Latent Reasoning via Reinforcement Learning Paper • 2505.18454 • Published May 24 • 6 • 2
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 36
unsloth/Llama-4-Scout-17B-16E-Instruct-unsloth Image-Text-to-Text • 109B • Updated Apr 12 • 15 • 17
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 36
Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments Paper • 2406.09815 • Published Jun 14, 2024