Collections
Discover the best community collections!
Collections including paper arxiv:2501.12948
-
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 136 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 57 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 67 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 136
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 104k • 1.56k -
black-forest-labs/FLUX.1-Kontext-dev
Image-to-Image • Updated • 326k • • 2.45k -
DeepSite v3
🐳16kGenerate any application by Vibe Coding
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper • 2409.12576 • Published • 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 173
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 3 • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 110k • 207 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 48.7k • 421 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 14.6k • 125
-
DeepSite v3
🐳16kGenerate any application by Vibe Coding
-
deepseek-ai/DeepSeek-R1-0528
Text Generation • 685B • Updated • 377k • • 2.39k -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
open-r1/Mixture-of-Thoughts
Viewer • Updated • 699k • 4.27k • 291
-
ibm-granite/granite-3.2-8b-instruct
Text Generation • 8B • Updated • 6.33k • 87 -
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 154k • • 3.08k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 131k • 1.82k -
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer • Updated • 3.91M • 6.65k • 610
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 104k • 1.56k -
black-forest-labs/FLUX.1-Kontext-dev
Image-to-Image • Updated • 326k • • 2.45k -
DeepSite v3
🐳16kGenerate any application by Vibe Coding
-
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 136 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 141 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper • 2409.12576 • Published • 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper • 2408.04619 • Published • 173
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 3 • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 110k • 207 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 48.7k • 421 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 14.6k • 125
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 57 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 67 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 136
-
DeepSite v3
🐳16kGenerate any application by Vibe Coding
-
deepseek-ai/DeepSeek-R1-0528
Text Generation • 685B • Updated • 377k • • 2.39k -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
open-r1/Mixture-of-Thoughts
Viewer • Updated • 699k • 4.27k • 291
-
ibm-granite/granite-3.2-8b-instruct
Text Generation • 8B • Updated • 6.33k • 87 -
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 154k • • 3.08k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 131k • 1.82k -
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer • Updated • 3.91M • 6.65k • 610