Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

MonsterMMORPG 
posted an update 1 day ago
view post
Post
1661
Qwen Image Models Training - 0 to Hero Level Tutorial - LoRA & Fine Tuning - Base & Edit Model - https://youtu.be/DPX3eBTuO_Y

This is a full comprehensive step-by-step tutorial for how to train Qwen Image models. This tutorial covers how to do LoRA training and full Fine-Tuning / DreamBooth training on Qwen Image models. It covers both the Qwen Image base model and the Qwen Image Edit Plus 2509 model. This tutorial is the product of 21 days of full R&D, costing over $800 in cloud services to find the best configurations for training. Furthermore, we have developed an amazing, ultra-easy-to-use Gradio app to use the legendary Kohya Musubi Tuner trainer with ease. You will be able to train locally on your Windows computer with GPUs with as little as 6 GB of VRAM for both LoRA and Fine-Tuning. Furthermore, I have shown how to train a character (person), a product (perfume) and a style (GTA5 artworks).

Tutorial Link : https://youtu.be/DPX3eBTuO_Y
  • 1 reply
·
codelion 
posted an update 2 days ago
view post
Post
2916
On this day in 2019, OpenAI released the final GPT-2 model as part of their staged release. I still remember that November well - so much was happening, but GPT-2's release felt like a watershed moment for the field. It showed us what was possible with carefully trained language models.

To recreate some of that GPT-2 magic, I recently tackled an interesting challenge: can you pretrain a language model with just 1 billion tokens - roughly 1/10th of what GPT-2 used - and still get comparable performance? After 50+ systematic experiments testing different dataset mixtures, the answer is yes.

The result is codelion/gpt-2-70m, which achieves over 90% of GPT-2's benchmark performance despite being trained on 10x less data. The key was finding the optimal dataset composition: 50% high-quality textbook PDFs, 30% filtered web content, and 20% educational resources. It even beats GPT-2 on TruthfulQA (47.31% vs 40.69%).

If you're interested in the full story of how we discovered this optimal mixture and why curriculum learning catastrophically failed, check out the complete article: https://huggingface.co/blog/codelion/optimal-dataset-mixing

Sometimes less really is more - when you mix it right.
  • 1 reply
·
mike-ravkine 
posted an update about 17 hours ago
view post
Post
907
There is no anxiety quite like powering up 2KW of basement compute after rewiring it all. Small bit of trouble with the horizontal 3090 because I misread my motherboard manual, but otherwise so far so good.. Next we see if I've built up enough cooling to hit my target TDP on those 3-slot nvlinked cards especially. The 4-slot bridges are much easier to work with but their prices went bananas and I couldn't acquire a second, so gotta get a little creative with intakes.
AdinaY 
posted an update about 24 hours ago
view post
Post
1063
Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license
atasoglu 
posted an update 2 days ago
view post
Post
879
Introducing ToolsGen 🛠️

I built a tool to solve a problem I kept running into: creating quality datasets for training LLMs to use tools.

ToolsGen takes your JSON tool definitions and automatically generates realistic user requests, corresponding tool calls, and evaluates them using an LLM-as-a-judge pipeline. It outputs datasets ready to use with Hugging Face.

What makes it useful:
- Generates realistic user requests + tool calls from JSON definitions
- LLM-as-a-judge quality scoring with multi-dimensional rubrics
- Multiple sampling strategies (random, parameter-aware, semantic)
- OpenAI-compatible API support
- Outputs JSONL with train/val splits

Still early days (API isn't stable yet), but it's already helping me generate tool-calling datasets much faster.

Check it out: https://github.com/atasoglu/toolsgen

Happy to hear feedback or ideas!
sergiopaniego 
posted an update 3 days ago
branikita 
posted an update about 23 hours ago
view post
Post
942
Load test conducted on the Feetech STS3250 servo motor.

With a 2 kg load on a 100 mm arm, the motor operated near its limit. At higher acceleration settings, lifting performance decreased noticeably. The temperature increased from 40 °C to 70 °C within 8 minutes.
The test highlights the torque and thermal constraints under sustained load conditions.

#Robotics #Engineering #ServoMotor #Testing #Feetech #Automation #Mechatronics #Hardware
narugo1992 
posted an update 1 day ago
view post
Post
298
Org Rate Limits = Free DDoS Invitation? 🤡
One serious question: Is there any way to actually ban clowns abusing this system?
Right now all it takes is one bored script kiddie with a grudge (or too much caffeine) to lawnmower an entire org's API endpoints into the stone age. They get to bathe in 429s while we're sitting here like 🤡 "Gee I wonder whose IP is carpet-bombing us today!"
The kicker? Zero accountability. Zero fingerprints. Just vibes™ and chaos. It’s basically a public invitation to hold entire communities hostage while wearing pajamas.
"Come for the open-source collaboration, stay for the unhinged DDoS piñata party!" 🎉
Fix when?
  • 2 replies
·
wang12390 
posted an update 1 day ago
abidlabs 
posted an update 4 days ago
view post
Post
7107
Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.
·