Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols Paper • 2510.09462 • Published 29 days ago • 5
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22 • 12
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22 • 12
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents Paper • 2506.14866 • Published Jun 17 • 5
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents Paper • 2506.14866 • Published Jun 17 • 5 • 2
Running on CPU Upgrade 13.7k 13.7k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots