Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics Paper • 2510.05137 • Published Oct 1 • 4 • 2
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Paper • 2509.23250 • Published Sep 27 • 5 • 2
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! Paper • 2509.26495 • Published Sep 30 • 10 • 2
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment Paper • 2507.20880 • Published Jul 28 • 10 • 2
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision Paper • 2505.19706 • Published May 26 • 3 • 2
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7 • 2
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published Dec 30, 2024 • 24 • 4
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Paper • 2409.11242 • Published Sep 17, 2024 • 7 • 2
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published Aug 20, 2024 • 12 • 2
Improving Text-To-Audio Models with Synthetic Captions Paper • 2406.15487 • Published Jun 18, 2024 • 1