Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
jsulzΒ 
posted an update Jun 26
Post
5293
It's been a bit since I took a step back and looked at xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
πŸ€— 700,000 users/orgs
πŸ“ˆ 350,000 repos
πŸš€ 15PB

Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

These are hard numbers to put into context, but let's try:

The latest run of the Common Crawl from commoncrawl was 471 TB.

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.

We're moving to a new phase in the process, so stay tuned.

This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.

I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski πŸ‘€)

Let me know if there's anything you're interested in; happy to dig in!

Any plans to migrate all repositories to Xet?

Β·

@nyuuzyou we definitely want everyone to be on Xet in the future, so yup!

Great job fellas keep it up!

let's gooo!!

Silly newbie Off Topic question: Are "creative writing" the keywords to use when searching for a fine-tuned LLM that can help me write like a top-tier journalist? Or would there be better search keywords? You've done quite a few fine-tunes, but TBH, I wouldn't know the best one even if I stumbled upon it.