Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published 22 days ago • 20
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published 22 days ago • 20
RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios Paper • 2412.14643 • Published Dec 19, 2024