IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks Paper • 2506.16402 • Published Jun 19 • 1
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24 • 6
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22 • 7
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 7
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published 28 days ago • 22
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Paper • 2501.12612 • Published Jan 22
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models Paper • 2402.05044 • Published Feb 7, 2024 • 2