SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Paper • 2309.07045 • Published Sep 13, 2023
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement Paper • 2502.16776 • Published Feb 24 • 6
SocialEval: Evaluating Social Intelligence of Large Language Models Paper • 2506.00900 • Published Jun 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 237