Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.05592

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24 • 79
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25 • 87
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8 • 34
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6 • 121

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18 • 5
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

Paper • 2501.02790 • Published Jan 6 • 9
Who's Your Judge? On the Detectability of LLM-Generated Judgments

Paper • 2509.25154 • Published Sep 29 • 29
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 55

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 116
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16 • 90
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

Paper • 2510.14438 • Published Oct 16 • 13

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Emergent Agentic Transformer from Chain of Hindsight Experience

Paper • 2305.16554 • Published May 26, 2023
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Paper • 2504.02882 • Published Apr 2 • 7

Edify 3D: Scalable High-Quality 3D Asset Generation

Paper • 2411.07135 • Published Nov 11, 2024 • 1
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 47
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 151
LongCat-Video Technical Report

Paper • 2510.22200 • Published Oct 25 • 25

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20 • 100

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 116
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16 • 90
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

Paper • 2510.14438 • Published Oct 16 • 13

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24 • 79
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25 • 87
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8 • 34
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6 • 121

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18 • 5
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22 • 21
Emergent Agentic Transformer from Chain of Hindsight Experience

Paper • 2305.16554 • Published May 26, 2023
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Paper • 2504.02882 • Published Apr 2 • 7

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

Edify 3D: Scalable High-Quality 3D Asset Generation

Paper • 2411.07135 • Published Nov 11, 2024 • 1
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 47
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 151
LongCat-Video Technical Report

Paper • 2510.22200 • Published Oct 25 • 25

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

Paper • 2501.02790 • Published Jan 6 • 9
Who's Your Judge? On the Detectability of LLM-Generated Judgments

Paper • 2509.25154 • Published Sep 29 • 29
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 55

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20 • 100

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs