Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.04364

Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 116

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

Benchmark and Evaluation

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Benchmarking LLMs for Political Science: A United Nations Perspective

Paper • 2502.14122 • Published Feb 19 • 2
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Paper • 2503.04644 • Published Mar 6 • 21
ExpertGenQA: Open-ended QA generation in Specialized Domains

Paper • 2503.02948 • Published Mar 4

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

Paper • 2410.08328 • Published Oct 10, 2024
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Paper • 2305.17390 • Published May 27, 2023 • 3
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems

Paper • 2502.11098 • Published Feb 16 • 13

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 23
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6 • 25
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 62

Agentic Knowledgeable Self-awareness

Paper • 2504.03553 • Published Apr 4 • 27
Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6 • 25
LIMI: Less is More for Agency

Paper • 2509.17567 • Published Sep 22 • 100

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 116

Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6 • 25
LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 62

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

Agentic Knowledgeable Self-awareness

Paper • 2504.03553 • Published Apr 4 • 27
Benchmarking LLMs' Swarm intelligence

Paper • 2505.04364 • Published May 7 • 20
Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6 • 25
LIMI: Less is More for Agency

Paper • 2509.17567 • Published Sep 22 • 100

Benchmark and Evaluation

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Benchmarking LLMs for Political Science: A United Nations Perspective

Paper • 2502.14122 • Published Feb 19 • 2
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Paper • 2503.04644 • Published Mar 6 • 21
ExpertGenQA: Open-ended QA generation in Specialized Domains

Paper • 2503.02948 • Published Mar 4

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

Paper • 2410.08328 • Published Oct 10, 2024
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Paper • 2305.17390 • Published May 27, 2023 • 3
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 69
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems

Paper • 2502.11098 • Published Feb 16 • 13

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 23
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs