optimum (Hugging Face Optimum)

🚀 Big news for AI builders!

We’re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

🔍 Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning — vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

👉 Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.

1 reply

·

pagezyhf

posted an update about 2 months ago

Post

836

What’s your biggest headache deploying Hugging Face models to the cloud—and how can we fix it for you?

8 replies

·

pagezyhf

posted an update about 2 months ago

Post

477

Qwen3 Next models are available in Azure AI Foundry 🚀

Qwen/qwen3-next-68c25fd6838e585db8eeea9d

pagezyhf

posted an update about 2 months ago

Post

3891

🤝 Collaborating with AMD to ensure Hugging Face Transformers runs smoothly on AMD GPUs!

We run daily CI on AMD MI325 to track the health of the most important model architectures and we’ve just made our internal dashboard public.

By making this easily accessible, we hope to spark community contributions and improve support for everyone!

2 replies

·

badaoui

posted an update 2 months ago

Post

409

🚀 Optimum libraries keep growing, and Optimum v2 is just around the corner!

I recently added ONNX export support for a bunch of new models in the optimum-onnx library, including: DeepSeek-V3, Cohere, Nemotron, Arcee, StableLM … and more!

⚡ With ONNX export, you can run your favorite models faster and more efficiently across different hardware backends, making deployment and experimentation much smoother.

💡 Have a model you’d love to see supported? Contributions are super welcome — let’s make Optimum even better together!

#ONNX #Optimum #HuggingFace #OpenSource #AI

jeffboudier

posted an update 2 months ago

Post

2994

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

pagezyhf

posted an update 3 months ago

Post

3212

We've improved the Deploy button on Hugging Face model pages for Microsoft Azure

1/ no more long waits before seeing model support status

2/ ready-to-use CLI and Python snippets

3/ redirection to Azure AI Foundry rather than Azure ML

✋ if you see any bugs or have feedback, open an issue on our repo:
https://github.com/huggingface/Microsoft-Azure

badaoui

posted an update 3 months ago

Post

3186

Is there a "one-size-fits-all" recipe for quantizing Large Language Models? 🤔

As part of my ongoing work in mixed-precision quantization, I've been exploring this question by measuring layer-by-layer sensitivity. The goal is to see if we can find universal rules for which layers can be quantized aggressively without impacting performance.The results are fascinating and reveal two key insights:

1️⃣ Sensitivity profiles are like architectural "fingerprints." Models from the same family share strikingly similar sensitivity patterns. As you can see in the charts below for the Gemma and SmolLM families, the ranking and relative sensitivity of the layers remain remarkably consistent. This suggests that the underlying architecture is a primary driver of a model's quantization behavior.

2️⃣ A "universal" mixed-precision quantization strategy is challenging. While models within a family are similar, these "fingerprints" change dramatically when comparing different architectures like LLaMA, Qwen, and StableLM. This highlights the difficulty in creating a generalized mixed-precision configuration that works optimally across all model families.

However, there is one near-universal truth we uncovered: the mlp.down_proj layer consistently emerges as one of the most sensitive components across all models studied.
This finding strongly resonates with the work in "The Super Weight in Large Language Models" (by Mengxia Yu et al.). The paper identifies that functionally critical parameters, or "super weights," are concentrated in these down_proj layers. Our empirical results provide clear validation for this theory, showing these layers are highly intolerant to precision loss.

In short, while every architecture has a unique sensitivity profile, a fingerprint shaped not only by its core design but also by its specific training dataset and optimization approach, some components remain universally critical!
What are your thoughts?

4 replies

·

pagezyhf

posted an update 3 months ago

Post

2187

Deploy GPT OSS models with Hugging Face on Azure AI!

We’re thrilled to enable OpenAI GPT OSS models on Azure AI Model Catalog for Azure users to try the model securely the day of its release.

In our official launch blogpost, there’s a section on how to deploy the model to your Azure AI Hub. Get started today!

https://huggingface.co/blog/welcome-openai-gpt-oss#azure

pagezyhf

posted an update 3 months ago

Post

275

We now have the newest Open AI models available on the Dell Enterprise Hub!

We built the Dell Enterprise Hub to provide access to the latest and greatest model from the Hugging Face community to our on-prem customers. We’re happy to give secure access to this amazing contribution from Open AI on the day of its launch!

/static-proxy?url=https%3A%2F%2Fdell.huggingface.co%2F%3C%2Fa%3E

IlyasMoutawwakil

posted an update 3 months ago

Post

3450

🚀 Optimum: The Last v1 Release 🚀
Optimum v1.27 marks the final major release in the v1 series. As we close this chapter, we're laying the groundwork for a more modular and community-driven future:
- Optimum v2: A lightweight core package for porting Transformers, Diffusers, or Sentence-Transformers to specialized AI hardware/software/accelerators..
- Optimum‑ONNX: A dedicated package where the ONNX/ONNX Runtime ecosystem lives and evolves, faster-moving and decoupled from the Optimum core.

🎯 Why this matters:
- A clearer governance path for ONNX, fostering stronger community collaboration and improved developer experience..
- Enable innovation at a faster pace in a more modular, open-source environment.

💡 What this means:
- More transparency, broader participation, and faster development driven by the community and key actors in the ONNX ecosystem (PyTorch, Microsoft, Joshua Lochner 👀, ...)
- A cleaner, more maintainable core Optimum, focused on extending HF libraries to special AI hardware/software/accelerators tooling and used by our partners (Intel Corporation, Amazon Web Services (AWS), AMD, NVIDIA, FuriosaAI, ...)

🛠️ Major updates I worked on in this release:
✅ Added support for Transformers v4.53 and SmolLM3 in ONNX/ONNXRuntime.
✅ Solved batched inference/generation for all supported decoder model architectures (LLMs).

✨ Big shoutout to @echarlaix for leading the refactoring work that cleanly separated ONNX exporter logic and enabled the creation of Optimum‑ONNX.

📝 Release Notes: https://lnkd.in/gXtE_qji
📦 Optimum : https://lnkd.in/ecAezNT6
🎁 Optimum-ONNX: https://lnkd.in/gzjyAjSi
#Optimum #ONNX #OpenSource #HuggingFace #Transformers #Diffusers

pagezyhf

posted an update 4 months ago

Post

357

🟪 Qwen/Qwen3‑235B‑A22B‑Instruct‑2507‑FP8 is now available in Microsoft Azure for one‑click deployment! 🚀

Check out their blogpost: https://qwenlm.github.io/blog/qwen3/

You can now find it in the Hugging Face Collection in Azure ML or Azure AI Foundry, along with 10k other Hugging Face models 🤗🤗
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

Bear with us for the non‑quantized version.

pagezyhf

posted an update 4 months ago

Post

1564

In our recent push to make more models available on Azure, we recently added SmolLM v3 in the catalog! 🚀

@juanjucm wrote a really detailed guide on how to deploy on Azure AI 🤗

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-smollm3

If you want to see other models, please let us know

1 reply

·

pagezyhf

posted an update 4 months ago

Post

212

🎉 New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2

We're excited to welcome Parakeet TDT 0.6B V2—a state-of-the-art English speech-to-text model—to the Azure Foundry Model Catalog.

What is it?

A powerful ASR model built on the FastConformer-TDT architecture, offering:
🕒 Word-level timestamps
✍️ Automatic punctuation & capitalization
🔊 Strong performance across noisy and real-world audio

It runs with NeMo, NVIDIA’s optimized inference engine.

Want to give it a try? 🎧 You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!

📘 Learn more by following this guide written by @alvarobartt

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-nvidia-parakeet-asr

pagezyhf

posted an update 4 months ago

Post

1271

If you want to dive into how the HF team worked with @seungrokj at @AMD
to optimize kernels on MI300, you should give a read to our latest blog!

Such a great educational material for anyone curious about the world of optimizing low level ML.

https://huggingface.co/blog/mi300kernels

Hugging Face Optimum

AI & ML interests

Recent Activity

Neuron Exporter

Neuron Exporter

feat: add progress indicator while compiling is running

feat: add progress indicator while compiling is running

optimum/bge-base-en-v1.5-neuronx

AI & ML interests

Recent Activity

Team members 13

optimum's activity

Neuron Exporter

Neuron Exporter

feat: add progress indicator while compiling is running

feat: add progress indicator while compiling is running