OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 88
When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality Evaluation Paper • 2510.07238 • Published Oct 8 • 14
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29 • 139
Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap Paper • 2509.26542 • Published Sep 30 • 8
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization Paper • 2509.23371 • Published Sep 27 • 5
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm Paper • 2409.07226 • Published Sep 11, 2024 • 1
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection Paper • 2306.09301 • Published Jun 15, 2023 • 1
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models Paper • 2505.19235 • Published May 25 • 3
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper • 2407.21794 • Published Jul 31, 2024 • 7
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals Paper • 2506.02281 • Published Jun 2 • 4
HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding Paper • 2504.10739 • Published Apr 14 • 2