view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 30 days ago • 259
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
Running 3.61k The Ultra-Scale Playbook 🌌 3.61k The ultimate guide to training LLM on large GPU Clusters
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google +1 Feb 19 • 74
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22, 2024 • 40
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 134 • 13
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 134