Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.20920

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 113
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 99
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
HuggingFaceFW/fineweb-2

Viewer • Updated 24 days ago • 4.48B • 86.9k • 690
Running

81

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

81

Evaluate multilingual models using FineTasks

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 14.3k • 91
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25 • 19
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

ahmedheakl/resume-atlas

Viewer • Updated Jul 1, 2024 • 13.4k • 207 • 10
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
Running

279

Infinite Dataset Hub

♾

279

Search and save datasets generated with a LLM in real time
IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Paper • 2509.06652 • Published Sep 8 • 24

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
HuggingFaceFW/fineweb-2

Viewer • Updated 24 days ago • 4.48B • 86.9k • 690
Running

81

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

81

Evaluate multilingual models using FineTasks

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 14.3k • 91
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 113
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 99
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25 • 19
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249

ahmedheakl/resume-atlas

Viewer • Updated Jul 1, 2024 • 13.4k • 207 • 10
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
Running

279

Infinite Dataset Hub

♾

279

Search and save datasets generated with a LLM in real time
IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Paper • 2509.06652 • Published Sep 8 • 24

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs