view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Aug 11 • 28
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1 • 39
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Paper • 2510.00406 • Published Oct 1 • 64
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning Paper • 2509.11543 • Published Sep 15 • 47
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 64
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 64
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144 • 5
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144
Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration Paper • 2501.05179 • Published Jan 9
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144 • 5
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 546
Accelerating Diffusion Transformers with Token-wise Feature Caching Paper • 2410.05317 • Published Oct 5, 2024
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 20
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 20