Self-Improvement in Multimodal Large Language Models: A Survey Paper • 2510.02665 • Published Oct 3 • 19
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3 • 24
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning Paper • 2508.20374 • Published Aug 28 • 21
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Paper • 2508.18264 • Published Aug 25 • 25
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published Aug 21 • 46
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation Paper • 2506.01565 • Published Jun 2 • 3
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 96
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy Paper • 2410.13218 • Published Oct 17, 2024 • 4
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 48
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 48
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14, 2024 • 31
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations Paper • 2311.08469 • Published Nov 14, 2023 • 11