Datasets common-pile/arxiv_abstracts_filtered Viewer • Updated 3 days ago • 2.5M • 142 • 5 common-pile/youtube_filtered Viewer • Updated Jun 6 • 986k • 183 • 4 common-pile/wikiteam_filtered Viewer • Updated Jun 6 • 10.2M • 524 common-pile/wikimedia_filtered Viewer • Updated Jun 6 • 12.9M • 249 • 5
Papers The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 56
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 56
Datasets common-pile/arxiv_abstracts_filtered Viewer • Updated 3 days ago • 2.5M • 142 • 5 common-pile/youtube_filtered Viewer • Updated Jun 6 • 986k • 183 • 4 common-pile/wikiteam_filtered Viewer • Updated Jun 6 • 10.2M • 524 common-pile/wikimedia_filtered Viewer • Updated Jun 6 • 12.9M • 249 • 5
Papers The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 56
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 56