SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking Paper • 2511.16618 • Published 2 days ago • 5
Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory Paper • 2507.16713 • Published Jul 22 • 21
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 39
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21 • 68
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware Mar 20, 2024 • 32
FLM-101B: An Open LLM and How to Train It with $100K Budget Paper • 2309.03852 • Published Sep 7, 2023 • 44
CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper • 2309.00610 • Published Sep 1, 2023 • 20
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior Paper • 2309.00359 • Published Sep 1, 2023 • 22
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 51
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models Paper • 2309.00986 • Published Sep 2, 2023 • 21
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Paper • 2308.06721 • Published Aug 13, 2023 • 33
The Hydra Effect: Emergent Self-repair in Language Model Computations Paper • 2307.15771 • Published Jul 28, 2023 • 19