view post Post 2413 You can now run GLM-4.7, the new 355B parameter SOTA model on your local device (128GB RAM).✨The model achieves SOTA performance on coding, agentic and chat benchmarks.GGUF: unsloth/GLM-4.7-GGUFGuide: https://docs.unsloth.ai/models/glm-4.7 See translation 1 reply · 🔥 4 4 ❤️ 2 2 🚀 1 1 + Reply
view post Post 2104 Google releases FunctionGemma, a new 270M parameter model that runs on just 0.5 GB RAM.✨Built for tool-calling, run locally on your phone at 50+ tokens/s, or fine-tune with Unsloth & deploy to your phone.GGUF: unsloth/functiongemma-270m-it-GGUFDocs + Notebook: https://docs.unsloth.ai/models/functiongemma See translation 2 replies · 👍 4 4 🤗 1 1 + Reply
view post Post 5173 NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! 🔥Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF💚 Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3 See translation 1 reply · 🔥 12 12 ❤️ 7 7 🤗 4 4 👍 1 1 + Reply
view post Post 1944 Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) 🐱We fixed the chat template, so performance should be much better now!24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF🧡Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2 See translation 🔥 8 8 🚀 5 5 ❤️ 3 3 🤗 2 2 + Reply
view post Post 3599 Mistral's new Ministral 3 models can now be Run & Fine-tuned locally! (16GB RAM)Ministral 3 have vision support and the best-in-class performance for their sizes.14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF🐱 Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3 See translation 3 replies · 🔥 17 17 🤗 7 7 ❤️ 5 5 🚀 3 3 + Reply
view post Post 8424 Qwen3-Next can now be Run locally! (30GB RAM)Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUFThe models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-nextThinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF See translation 🔥 37 37 ❤️ 11 11 🚀 7 7 🤗 3 3 + Reply
view post Post 4340 You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs: unsloth/Kimi-K2-Thinking-GGUFWe shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.We also collaborated with the Moonshot AI Kimi team on a system prompt fix! 🥰Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally See translation ❤️ 10 10 🚀 9 9 🔥 6 6 🤗 4 4 🤯 3 3 + Reply
view post Post 6553 Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!🐋GGUFs: unsloth/DeepSeek-V3.1-GGUFThe 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.Guide: https://docs.unsloth.ai/basics/deepseek-v3.1 See translation ❤️ 19 19 🔥 9 9 🚀 5 5 + Reply
view post Post 5621 Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! 🔥🦥20b GGUF: unsloth/gpt-oss-20b-GGUF120b GGUF: unsloth/gpt-oss-120b-GGUFModel will run on 14GB RAM for 20b and 66GB for 120b. See translation 2 replies · ❤️ 21 21 🔥 7 7 🚀 5 5 + Reply