π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published 8 days ago • 56
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents Paper • 2505.24878 • Published May 30 • 22
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Paper • 2507.07484 • Published Jul 10 • 17
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities Paper • 2504.20734 • Published Apr 29 • 61
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results Paper • 2504.13677 • Published Apr 18 • 1
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 77
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation Paper • 2411.03920 • Published Nov 6, 2024 • 1
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback Paper • 2503.21332 • Published Mar 27 • 23
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published Mar 27 • 42
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation Paper • 2503.22675 • Published Mar 28 • 36
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published Mar 27 • 83
Agent models: Internalizing Chain-of-Action Generation into Reasoning models Paper • 2503.06580 • Published Mar 9 • 19
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective Paper • 2502.13573 • Published Feb 19 • 2
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 45
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published Feb 20 • 26
Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning Paper • 2502.14372 • Published Feb 20 • 36
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 192