LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper โข 2510.08211 โข Published 28 days ago โข 22 โข 2
VLSBench: Unveiling Visual Leakage in Multimodal Safety Paper โข 2411.19939 โข Published Nov 29, 2024 โข 10 โข 2