StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements Paper • 2408.15666 • Published Aug 28, 2024 • 11
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published Jun 26, 2024 • 10
Localized Symbolic Knowledge Distillation for Visual Commonsense Models Paper • 2312.04837 • Published Dec 8, 2023 • 3
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 32
Tailoring Self-Rationalizers with Multi-Reward Distillation Paper • 2311.02805 • Published Nov 6, 2023 • 7
The Generative AI Paradox: "What It Can Create, It May Not Understand" Paper • 2311.00059 • Published Oct 31, 2023 • 20