Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.20920

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
HuggingFaceFW/finewiki

Viewer • Updated Oct 22 • 61.6M • 19.6k • 263
nhagar/fineweb_urls

Viewer • Updated May 15 • 24.5B • 239 • 1
PleIAs/common_corpus

Viewer • Updated Jun 10 • 470M • 42.7k • 320

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 250

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 22.2k • 92
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25 • 19
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18 • 5
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 113
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 99
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 250

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 172
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
HuggingFaceFW/finewiki

Viewer • Updated Oct 22 • 61.6M • 19.6k • 263
nhagar/fineweb_urls

Viewer • Updated May 15 • 24.5B • 239 • 1
PleIAs/common_corpus

Viewer • Updated Jun 10 • 470M • 42.7k • 320

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 250

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 22.2k • 92
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 113
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 99
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25 • 19
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 40

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 250

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18 • 5
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19 • 88

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 172
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs