Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.12948

bigcode/the-stack-v2

Viewer • Updated Apr 23, 2024 • 5.45B • 6.53k • 429
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated Aug 21 • 196k • • 1.25k

journal-menarik

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 136
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 49
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 57
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 67
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 136

A very long nameA very long nameA very long nameA very long

A very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very

nvidia/OpenMathReasoning

Viewer • Updated May 27 • 5.68M • 14.1k • 365
zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 8.89k • 275
microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.17k • 1.22k
Running

Featured

808

Qwen3 Demo

📊

808

Generate responses to text prompts in a chat interface

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
nanonets/Nanonets-OCR-s

Image-Text-to-Text • 4B • Updated Jun 20 • 104k • 1.56k
black-forest-labs/FLUX.1-Kontext-dev

Image-to-Image • Updated Jun 27 • 326k • • 2.45k
Running

Featured

16k

DeepSite v3

🐳

16k

Generate any application by Vibe Coding

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 141
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published Sep 19, 2024 • 16
Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 173

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18 • 3 • 1
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 110k • 207
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated 3 days ago • 48.7k • 421
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 14.6k • 125

Running

Featured

16k

DeepSite v3

🐳

16k

Generate any application by Vibe Coding
deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated May 29 • 377k • • 2.39k
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
open-r1/Mixture-of-Thoughts

Viewer • Updated May 26 • 699k • 4.27k • 291

ibm-granite/granite-3.2-8b-instruct

Text Generation • 8B • Updated Apr 17 • 6.33k • 87
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 154k • • 3.08k
Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 131k • 1.82k
nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated May 8 • 3.91M • 6.65k • 610

bigcode/the-stack-v2

Viewer • Updated Apr 23, 2024 • 5.45B • 6.53k • 429
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated Aug 21 • 196k • • 1.25k

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
nanonets/Nanonets-OCR-s

Image-Text-to-Text • 4B • Updated Jun 20 • 104k • 1.56k
black-forest-labs/FLUX.1-Kontext-dev

Image-to-Image • Updated Jun 27 • 326k • • 2.45k
Running

Featured

16k

DeepSite v3

🐳

16k

Generate any application by Vibe Coding

journal-menarik

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 136
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 141
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published Sep 19, 2024 • 16
Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 173

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18 • 3 • 1
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 110k • 207
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated 3 days ago • 48.7k • 421
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 14.6k • 125

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 49
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 57
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 67
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 136

Running

Featured

16k

DeepSite v3

🐳

16k

Generate any application by Vibe Coding
deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated May 29 • 377k • • 2.39k
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
open-r1/Mixture-of-Thoughts

Viewer • Updated May 26 • 699k • 4.27k • 291

A very long nameA very long nameA very long nameA very long

A very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very long nameA very

nvidia/OpenMathReasoning

Viewer • Updated May 27 • 5.68M • 14.1k • 365
zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 8.89k • 275
microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.17k • 1.22k
Running

Featured

808

Qwen3 Demo

📊

808

Generate responses to text prompts in a chat interface

ibm-granite/granite-3.2-8b-instruct

Text Generation • 8B • Updated Apr 17 • 6.33k • 87
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 154k • • 3.08k
Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 131k • 1.82k
nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated May 8 • 3.91M • 6.65k • 610

Previous
1
2
3
4
5
...
12
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs