view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning Jun 11, 2024 • 21
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models 11 days ago • 99
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 22 days ago • 62
Apertus LLM Collection Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1 • 315
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7 • 393
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models Jul 18 • 50
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published Jun 10 • 27
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 72