Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense Paper • 2501.02629 • Published Jan 5 • 1
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models Paper • 2404.02936 • Published Apr 3, 2024 • 3