Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues Paper • 2412.12502 • Published Dec 17, 2024
An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability Paper • 2505.16193 • Published May 22
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach Paper • 2509.21950 • Published Sep 26