Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
chenhany 's Collections
alignment-data
llm
pretrain-data
dialogue-sft-data
sft-data
benchmarks
slm
Paper
alignment-dataset

pretrain-data

updated Oct 22, 2024
Upvote
-

  • cerebras/SlimPajama-627B

    Preview • Updated Jul 7, 2023 • 45.2k • 506

  • JeanKaddour/minipile

    Viewer • Updated Jun 20, 2023 • 1.01M • 3.29k • 132

  • nampdn-ai/tiny-textbooks

    Viewer • Updated Jul 3, 2024 • 420k • 442 • 160

  • open-phi/textbooks

    Viewer • Updated Oct 8, 2023 • 1.8k • 560 • 90

  • DataComp-LM: In search of the next generation of training sets for language models

    Paper • 2406.11794 • Published Jun 17, 2024 • 54

  • HuggingFaceFW/fineweb-edu

    Viewer • Updated Jul 11 • 3.5B • 237k • 819

  • nampdn-ai/mini-fineweb

    Viewer • Updated Mar 4 • 291M • 527 • 25

  • allenai/dolma

    Updated Apr 17, 2024 • 1.52k • 960

  • H-D-T/Buzz-V1.2

    Viewer • Updated Oct 30, 2024 • 3.14M • 256 • 12

  • Zyphra/Zyda-2

    Preview • Updated Aug 6 • 183k • 85
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs