SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI Paper • 2410.11096 • Published Oct 14, 2024 • 13
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation Paper • 2505.23885 • Published May 29
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents Paper • 2505.05849 • Published May 9
Co-PatcheR Collection Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models • 3 items • Updated May 29
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts Paper • 2306.04528 • Published Jun 7, 2023 • 3
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning Paper • 2308.02533 • Published Aug 1, 2023
Large Language Models Understand and Can be Enhanced by Emotional Stimuli Paper • 2307.11760 • Published Jul 14, 2023 • 1
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks Paper • 2309.17167 • Published Sep 29, 2023 • 1
PromptBench: A Unified Library for Evaluation of Large Language Models Paper • 2312.07910 • Published Dec 13, 2023 • 18