Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Paper • 2509.23250 • Published Sep 27 • 5
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision Paper • 2505.19706 • Published May 26 • 3
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published Dec 16, 2024 • 9
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision Paper • 2505.19706 • Published May 26 • 3
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published Aug 20, 2024 • 12
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published Feb 3 • 14
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published Dec 30, 2024 • 24
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published Dec 16, 2024 • 9