TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment Paper • 2503.16929 • Published Mar 21
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29 • 39
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published Apr 24 • 13
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Paper • 2412.11906 • Published Dec 16, 2024
Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation Paper • 2306.16650 • Published Jun 29, 2023