Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.08221

about 1 month ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 188

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

Additional for LLMs

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4 • 19
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 224

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

about 24 hours ago

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1 • 58
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22
Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Paper • 2509.25810 • Published Sep 30 • 5
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 262

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 243 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.62k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 110 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

about 1 month ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

about 24 hours ago

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1 • 58
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22
Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Paper • 2509.25810 • Published Sep 30 • 5
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 262

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 188

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

Additional for LLMs

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4 • 19
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 224

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 243 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.62k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 110 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs