-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2510.05684
-
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26 -
Dr.LLM: Dynamic Layer Routing in LLMs
Paper • 2510.12773 • Published • 31 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 139 -
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Paper • 2510.08759 • Published • 46
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Paper • 2510.23587 • Published • 65 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 139 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 122 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 186
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 63 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 488 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 53
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Paper • 2510.23587 • Published • 65 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 139 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 122 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 186
-
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26 -
Dr.LLM: Dynamic Layer Routing in LLMs
Paper • 2510.12773 • Published • 31 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 139 -
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Paper • 2510.08759 • Published • 46
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 63 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 488 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 53
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63