seyf1elislam (seyf EL)

posted an update 3 months ago

Post

349

🚀 Run <14B, 12B, 8B… LLMs **for FREE** on Google Colab (15GB VRAM GPU)

🔗 Repo: https://github.com/seyf1elislam/LocalLLM_OneClick_Colab

📌 How to Use

1. Open the notebook → Click “Open in Colab” and enable GPU mode.
2. Enter model details → Provide the Hugging Face repo name & quantization type.

* Example: unsloth/Qwen3-8B-GGUF with quant Q5_k_m
3. Run all cells → Wait 1–3 minutes. You'll get a link to the GUI & API (OpenAI-compatible).

💡 Yes, it’s really free. Enjoy! ✨

---

📝 Supported Models (examples)

* Qwen3 14B** → Q5_k_m, Q4_k_m
* Qwen3 8B** → Q8_0
* Nemo 12B → Q6_k, Q5_k_m
* Gemma3 12B → Q6_k, Q5_k_m

---

💻 Available Notebooks

1. KoboldCpp(⭐⭐⭐ Recommended – faster setup & inference)
🔗 https://github.com/seyf1elislam/LocalLLM_OneClick_Colab/blob/main/awesome_koboldcpp_notebook.ipynb
2. TextGen-WebUI(⭐⭐ Recommended)
🔗 https://github.com/seyf1elislam/LocalLLM_OneClick_Colab/blob/main/Run_any_gguf_model_in_TextGen_webui.ipynb

reacted to merve's post with 🔥 6 months ago

Post

2475

tis the year of any-to-any/omni models 🤠
ByteDance-Seed/BAGEL-7B-MoT 7B native multimodal model that understands and generates both image + text

it outperforms leading VLMs like Qwen 2.5-VL 👏 and has Apache 2.0 license 😱

reacted to tianchez's post with 🚀 9 months ago

Post

4606

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·

reacted to victor's post with 🔥🔥 10 months ago

Post

6464

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

6 replies

·

reacted to hexgrad's post with 🔥🔥🔥 10 months ago

Post

8806

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

11 replies

·

reacted to AdinaY's post with 👀 about 1 year ago

Post

1511

LLaMA Mesh 🔥 Unifying 3D Mesh Generation with Language Models

Model: Zhengyi/LLaMA-Mesh
Demo: Zhengyi/LLaMA-Mesh
Paper: LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models (2411.09595)

✨ Unified 3D generation & text understanding.
✨ 3D meshes as plain text for seamless LLM integration.
✨ High-quality 3D outputs rivaling specialized models.

reacted to di-zhang-fdu's post with 👍 about 1 year ago

Post

6466

LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)

2 replies

·

reacted to Wauplin's post with ❤️ over 1 year ago

Post

3399

🚀 I'm excited to announce that huggingface_hub's InferenceClient now supports OpenAI's Python client syntax! For developers integrating AI into their codebases, this means you can switch to open-source models with just three lines of code. Here's a quick example of how easy it is.

Why use the InferenceClient?
🔄 Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub.
🤗 Direct integration: easily launch a model to run inference using our Inference Endpoint service.
🚀 Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.

More details in https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#openai-compatibility

3 replies

·

reacted to lamhieu's post with 🔥 over 1 year ago

Post

2942

Wow, this is amazing! 🤯
Samba is a powerful hybrid model with an unlimited context length, combining Mamba, MLP, Sliding Window Attention, and MLP stacking. Samba largest version, Samba-3.8B, trained on 3.2 trillion tokens, excels in benchmarks like MMLU, GSM8K, and HumanEval, and shines in long-context tasks with minimal tuning.
---
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
Github: https://github.com/microsoft/Samba

reacted to DavidGF's post with 🔥 over 1 year ago

Post

1768

The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? 🐙
✅ Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

✅ Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

✅ Adaptability: Enhanced input formatting supports the model’s adaptability to diverse conversational contexts.

✅ Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

✅ Open Source Pipeline: We’re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude 🤩

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
https://huggingface.co/cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!

reacted to merve's post with 🔥 over 1 year ago

Post

1314

We will be providing ZeroGPU grants (for Spaces inference) to those who want to fine-tune PaliGemma and build a Space 🔥

You can pick any dataset of your choice!

Example code: https://colab.research.google.com/drive/1x_OEphRK0H97DqqxEyiMewqsTiLD_Xmi?usp=sharing (you can use a lower GPU with QLoRA)

Datasets:
https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=trending
https://huggingface.co/datasets?task_categories=task_categories:image-to-text&sort=trending

5 replies

·

reacted to isidentical's post with ❤️ over 1 year ago

Post

2150

Happy to announce https://imgsys.org -- a sister project to Chatbot Arena by lmsys -- for comparing different text guided image generation models models. Try it natively on HuggingFace: https://huggingface.co/spaces/fal-ai/imgsys

1 reply

·

reacted to m-ric's post with 👀 over 1 year ago

Post

1818

𝐓𝐡𝐞 𝐫𝐞𝐭𝐮𝐫𝐧 𝐨𝐟 𝐭𝐡𝐞 𝐑𝐍𝐍𝐬 ⚔ 𝐍𝐞𝐰 𝐌𝐚𝐦𝐛𝐚-𝐛𝐚𝐬𝐞𝐝 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 "𝐉𝐚𝐦𝐛𝐚"

Since the release of BERT by Google in 2019, Transformers architecture have taken over machine learning thanks to their 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺, that gives them the ability to focus on important points of the input. But 𝙖𝙩𝙩𝙚𝙣𝙩𝙞𝙤𝙣 𝙘𝙤𝙢𝙥𝙪𝙩𝙖𝙩𝙞𝙤𝙣 𝙞𝙨 𝙦𝙪𝙖𝙙𝙧𝙖𝙩𝙞𝙘 𝙞𝙣 𝙩𝙝𝙚 𝙞𝙣𝙥𝙪𝙩 𝙡𝙚𝙣𝙜𝙩𝙝.

💫 The Mamba paper, published in December 2023, announced the return of the RNNs: it has no attention, but integrates a selection mechanism, which should be able to reproduce the “focus” ability of attention, in an architecture for which the compute requirements 𝗴𝗿𝗼𝘄 𝗼𝗻𝗹𝘆 𝗹𝗶𝗻𝗲𝗮𝗿𝗹𝘆 𝗶𝗻 𝗶𝗻𝗽𝘂𝘁 𝗹𝗲𝗻𝗴𝘁𝗵!
🤔 Would this work? We had yet to see a large Mamba model recovering the performance of Attention-based Transformers.

💥 But now it's done! A (Mamba + Transformers) hybrid just beat Transformers!

The AI21 Labs team just released Jamba.
They insert a few Transformer layers to inject some attention in a big pile of Mamba layers, thus getting the best of both worlds.

𝙏𝙇;𝘿𝙍:
🏗️ 𝗡𝗲𝘄 𝗠𝗼𝗘 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: 4 Jamba blocks, each of these being 7 Mamba layers for 1 Transformer.
🏋️ 𝟱𝟮𝗕 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟭𝟮𝗕 𝗮𝗰𝘁𝗶𝘃𝗲 𝗮𝘁 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: This reduction is enabled by Mixture of Experts, and similar to Mixtral (47B parameters - 13B active).
🏎️ 𝗦𝗽𝗲𝗲𝗱: 𝘅𝟯 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁. Jamba is much faster than similar-sized Transformer models on long contexts.
📏 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵: 𝟭𝟰𝟬𝗞 𝘁𝗼𝗸𝗲𝗻𝘀 on a single 80GB A100!
💪 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗳𝗼𝗿 𝘁𝗵𝗶𝘀 𝘀𝗶𝘇𝗲. The small injection of attention seems sufficient since Jamba beats the open-source reference Mixtral-8x7B on many benchmarks!

Try it here 👉 ai21labs/Jamba-v0.1

reacted to aari1995's post with 🚀 over 1 year ago

Post

3555

ARABIC CHINESE FRENCH GERMAN RUSSIAN SPANISH TURKISH

mLLM - first release:
orca_dpo_pairs by Intel (translated into 7 languages)

ARABIC CHINESE FRENCH GERMAN RUSSIAN SPANISH TURKISH

Upcoming:
- more datasets
- cleaning steps
- a blogpost
- stay updated at

multilingual

multilingual/orca_dpo_pairs

5 replies

·

reacted to mlabonne's post with 🚀 over 1 year ago

Post

9409

⚡ AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

💻 AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4

19 replies

·

reacted to MonsterMMORPG's post with ❤️ over 1 year ago

Post

2272

Now You Can Full Fine Tune / DreamBooth Stable Diffusion XL (SDXL) with only 10.3 GB VRAM via OneTrainer — Both U-NET and Text Encoder 1 is trained — Compared 14 GB config vs slower 10.3 GB Config

Full config and instructions are shared here : https://www.patreon.com/posts/96028218

Used SG161222/RealVisXL_V4.0 as a base model and OneTrainer to train on Windows 10 : https://github.com/Nerogar/OneTrainer

The posted example x/y/z checkpoint comparison images are not cherry picked. So I can get perfect images with multiple tries.

Trained 150 epochs, 15 images and used my ground truth 5200 regularization images : https://www.patreon.com/posts/massive-4k-woman-87700469

In each epoch only 15 of regularization images used to make DreamBooth training affect

As a caption only “ohwx man” is used, for regularization images just “man”
You can download configs and full instructions here : https://www.patreon.com/posts/96028218

Hopefully full public tutorial coming within 2 weeks. I will show all configuration as well

The tutorial will be on our channel : https://www.youtube.com/SECourses
Training speeds are as below thus durations:

RTX 3060 — slow preset : 3.72 second / it thus 15 train images 150 epoch 2 (reg images concept) : 4500 steps = 4500 3.72 / 3600 = 4.6 hours

RTX 3090 TI — slow preset : 1.58 second / it thus : 4500 * 1.58 / 3600 = 2 hours

RTX 3090 TI — fast preset : 1.45 second / it thus : 4500 * 1.45 / 3600 = 1.8 hours

A quick tutorial for how to use concepts in OneTrainer : https://youtu.be/yPOadldf6bI

5 replies

·

reacted to davanstrien's post with 🤗 over 1 year ago

Post

Can we improve the quality of open LLMs for more languages?

Step 1: Evaluate current SOTA.

The Data Is Better Together community has rated more than 10K prompts for quality. We now want to translate a subset of these to help address the language gap in model evals.

The plan is roughly this:

- We started with https://huggingface.co/datasets/DIBT/10k_prompts_ranked and took a subset of 500 high-quality prompts
- We're asking the community to translate these prompts into different languages
- We'll evaluate the extent to which we can use AlpacaEval and similar approaches to rate the outputs of models across these different languages
- If it works well, we can more easily evaluate open LLMs across different languages by using a judge LLM to rate the quality of outputs from different models.

You can find more details in our new GitHub repo: https://github.com/huggingface/data-is-better-together (don't forget to give it a ⭐!)

seyf EL

AI & ML interests

Recent Activity

Organizations

seyf EL

AI & ML interests

Recent Activity

Organizations

seyf1elislam's activity