-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2509.01396
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 9 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 30 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 9 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 30 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 62 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Paper • 2503.10639 • Published • 53