SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
Cosmos-Predict2 Collection World Foundation Model for Future Prediction • 13 items • Updated about 24 hours ago • 29
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 55
Physical AI Collection Collection of open, commercial-grade datasets for physical AI developers • 23 items • Updated about 24 hours ago • 89
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated about 24 hours ago • 18
Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated about 24 hours ago • 21
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms Paper • 2310.07161 • Published Oct 11, 2023 • 1
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Jul 21 • 125
OpenReasoning-Nemotron Collection Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science. • 6 items • Updated about 24 hours ago • 44