GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).
✨ Unified 3D generation & text understanding. ✨ 3D meshes as plain text for seamless LLM integration. ✨ High-quality 3D outputs rivaling specialized models.
reacted to di-zhang-fdu's
post with 👍about 1 year ago
LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models! https://github.com/SimpleBerry/LLaMA-O1/
What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF? Just a little bite of strawberry!🍓
🚀 I'm excited to announce that huggingface_hub's InferenceClient now supports OpenAI's Python client syntax! For developers integrating AI into their codebases, this means you can switch to open-source models with just three lines of code. Here's a quick example of how easy it is.
Why use the InferenceClient? 🔄 Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub. 🤗 Direct integration: easily launch a model to run inference using our Inference Endpoint service. 🚀 Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.
Wow, this is amazing! 🤯 Samba is a powerful hybrid model with an unlimited context length, combining Mamba, MLP, Sliding Window Attention, and MLP stacking. Samba largest version, Samba-3.8B, trained on 3.2 trillion tokens, excels in benchmarks like MMLU, GSM8K, and HumanEval, and shines in long-context tasks with minimal tuning. --- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling" Github: https://github.com/microsoft/Samba
The kraken has awakened! A Game-Changer in LLM Flexibility and Performance!
Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.
What Can It Do? 🐙 ✅ Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+
✅ Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.
✅ Adaptability: Enhanced input formatting supports the model’s adaptability to diverse conversational contexts.
✅ Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.
✅ Open Source Pipeline: We’re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken
Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude 🤩
Since the release of BERT by Google in 2019, Transformers architecture have taken over machine learning thanks to their 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺, that gives them the ability to focus on important points of the input. But 𝙖𝙩𝙩𝙚𝙣𝙩𝙞𝙤𝙣 𝙘𝙤𝙢𝙥𝙪𝙩𝙖𝙩𝙞𝙤𝙣 𝙞𝙨 𝙦𝙪𝙖𝙙𝙧𝙖𝙩𝙞𝙘 𝙞𝙣 𝙩𝙝𝙚 𝙞𝙣𝙥𝙪𝙩 𝙡𝙚𝙣𝙜𝙩𝙝.
💫 The Mamba paper, published in December 2023, announced the return of the RNNs: it has no attention, but integrates a selection mechanism, which should be able to reproduce the “focus” ability of attention, in an architecture for which the compute requirements 𝗴𝗿𝗼𝘄 𝗼𝗻𝗹𝘆 𝗹𝗶𝗻𝗲𝗮𝗿𝗹𝘆 𝗶𝗻 𝗶𝗻𝗽𝘂𝘁 𝗹𝗲𝗻𝗴𝘁𝗵! 🤔 Would this work? We had yet to see a large Mamba model recovering the performance of Attention-based Transformers.
💥 But now it's done! A (Mamba + Transformers) hybrid just beat Transformers!
The AI21 Labs team just released Jamba. They insert a few Transformer layers to inject some attention in a big pile of Mamba layers, thus getting the best of both worlds.
𝙏𝙇;𝘿𝙍: 🏗️ 𝗡𝗲𝘄 𝗠𝗼𝗘 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: 4 Jamba blocks, each of these being 7 Mamba layers for 1 Transformer. 🏋️ 𝟱𝟮𝗕 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟭𝟮𝗕 𝗮𝗰𝘁𝗶𝘃𝗲 𝗮𝘁 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: This reduction is enabled by Mixture of Experts, and similar to Mixtral (47B parameters - 13B active). 🏎️ 𝗦𝗽𝗲𝗲𝗱: 𝘅𝟯 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁. Jamba is much faster than similar-sized Transformer models on long contexts. 📏 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵: 𝟭𝟰𝟬𝗞 𝘁𝗼𝗸𝗲𝗻𝘀 on a single 80GB A100! 💪 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗳𝗼𝗿 𝘁𝗵𝗶𝘀 𝘀𝗶𝘇𝗲. The small injection of attention seems sufficient since Jamba beats the open-source reference Mixtral-8x7B on many benchmarks!
- GGUF: perfect for inference on CPUs (and LM Studio) - GPTQ/EXL2: fast inference on GPUs - AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm) - HQQ: extreme quantization with decent 2-bit and 3-bit models
Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.
Now You Can Full Fine Tune / DreamBooth Stable Diffusion XL (SDXL) with only 10.3 GB VRAM via OneTrainer — Both U-NET and Text Encoder 1 is trained — Compared 14 GB config vs slower 10.3 GB Config
In each epoch only 15 of regularization images used to make DreamBooth training affect
As a caption only “ohwx man” is used, for regularization images just “man” You can download configs and full instructions here : https://www.patreon.com/posts/96028218
Hopefully full public tutorial coming within 2 weeks. I will show all configuration as well
Can we improve the quality of open LLMs for more languages?
Step 1: Evaluate current SOTA.
The Data Is Better Together community has rated more than 10K prompts for quality. We now want to translate a subset of these to help address the language gap in model evals.
The plan is roughly this:
- We started with https://huggingface.co/datasets/DIBT/10k_prompts_ranked and took a subset of 500 high-quality prompts - We're asking the community to translate these prompts into different languages - We'll evaluate the extent to which we can use AlpacaEval and similar approaches to rate the outputs of models across these different languages - If it works well, we can more easily evaluate open LLMs across different languages by using a judge LLM to rate the quality of outputs from different models.