Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published 13 days ago • 63
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published 13 days ago • 63
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22, 2024 • 51
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published Nov 19 • 52
InteracSPARQL: An Interactive System for SPARQL Query Refinement Using Natural Language Explanations Paper • 2511.02002 • Published Nov 3 • 1
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4 • 101
Rethinking Spectral Augmentation for Contrast-based Graph Self-Supervised Learning Paper • 2405.19600 • Published May 30, 2024
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models Paper • 2404.05083 • Published Apr 7, 2024
The Underappreciated Power of Vision Models for Graph Structural Understanding Paper • 2510.24788 • Published Oct 27 • 35
The Underappreciated Power of Vision Models for Graph Structural Understanding Paper • 2510.24788 • Published Oct 27 • 35
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Rethinking Spectral Augmentation for Contrast-based Graph Self-Supervised Learning Paper • 2405.19600 • Published May 30, 2024
Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization Paper • 2208.08681 • Published Aug 18, 2022
Roughness Index for Loss Landscapes of Neural Network Models of Partial Differential Equations Paper • 2103.11069 • Published Mar 20, 2021
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models Paper • 2404.05083 • Published Apr 7, 2024