RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code Paper • 2409.15154 • Published Sep 23, 2024
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Paper • 2505.15431 • Published May 21 • 1
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Paper • 2507.04952 • Published Jul 7 • 9
Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework Paper • 2507.06829 • Published Jul 9