Localizing Moments in Long Video Via Multimodal Guidance Paper • 2302.13372 • Published Feb 26, 2023 • 1
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions Paper • 2112.00431 • Published Dec 1, 2021
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection Paper • 2502.20361 • Published Feb 27 • 1