view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation By exploding-gradients • Sep 16 • 11
An efficient probabilistic hardware architecture for diffusion-like models Paper • 2510.23972 • Published 12 days ago • 3
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 10 days ago • 40
Running on CPU Upgrade 1.72k 1.72k The Smol Training Playbook: The Secrets to Building World-Class LLMs 📝
view article Article 3+ Years of ML & Society at Hugging Face 🤗🤝🧑🤝🧑 By yjernite and 3 others • 10 days ago • 13
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning 13 days ago • 61
Running on CPU Upgrade 1.72k 1.72k The Smol Training Playbook: The Secrets to Building World-Class LLMs 📝
Running on CPU Upgrade 1.72k 1.72k The Smol Training Playbook: The Secrets to Building World-Class LLMs 📝
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 By MiniMax-AI • 9 days ago • 21
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated 10 days ago • 56
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024 • 3