FineData

Team

community

AI & ML interests

We release large pre-training datasets to accelerate open LLM development. Part of the Hugging Face Science team (hf.co/science)

Recent Activity

thomwolf authored a paper 26 days ago

Robot Learning: A Tutorial

lvwerra authored a paper 28 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

guipenedo updated a dataset 28 days ago

HuggingFaceFW/finewiki

View all activity

Papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

View all Papers

HuggingFaceFW 's collections 7