view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix By codelion • 4 days ago • 28
Ferret Collection A framework for training LLM agents via RL with advanced search capability: https://github.com/Tree-Shu-Zhao/ferret • 7 items • Updated 17 days ago • 1
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 49
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89