Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22 • 12
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Paper • 2404.01318 • Published Mar 28, 2024
A Modern Look at the Relationship between Sharpness and Generalization Paper • 2302.07011 • Published Feb 14, 2023