Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Paper β’ 2504.03735 β’ Published Apr 1 β’ 1
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness Paper β’ 2510.01670 β’ Published Oct 2 β’ 6
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness Paper β’ 2510.01670 β’ Published Oct 2 β’ 6 β’ 3
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness Paper β’ 2510.01670 β’ Published Oct 2 β’ 6
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 β’ 55
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Paper β’ 2502.20383 β’ Published Feb 27 β’ 3
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Paper β’ 2504.03735 β’ Published Apr 1 β’ 1
Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models Paper β’ 2411.04291 β’ Published Nov 6, 2024
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper β’ 2503.05132 β’ Published Mar 7 β’ 57
microsoft/Phi-3.5-vision-instruct Image-Text-to-Text β’ 4B β’ Updated Sep 26, 2024 β’ 445k β’ 715
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 131