helenai
/

Qwen2.5-VL-7B-Instruct-ov-nf4-npu

Model card Files Files and versions

helenai commited on Sep 4

Commit

30a382d

·

verified ·

1 Parent(s): d3e8b78

Create README.md

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+---
+This is the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) model, converted to OpenVINO, with nf4 weights for the language model, int8 weights for the other models.
+The nf4 weights are compressed with symmetric, channel-wise quantization. The model works on Intel NPU. See below for the model export command/properties.
+## Download Model
+To download the model, run `pip install huggingface-hub[cli]` and then:
+```
+huggingface-cli download helenai/Qwen2.5-VL-7B-Instruct-ov-nf4-npu --local-dir Qwen2.5-VL-7B-Instruct-ov-nf4-npu
+```
+## Run inference with OpenVINO GenAI
+Use OpenVINO GenAI to run inference on this model. This model works with OpenVINO GenAI 2025.3 and later. Make sure to use the latest NPU driver ([Windows](https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html), [Linux](https://github.com/intel/linux-npu-driver))
+- Install OpenVINO GenAI and pillow:
+```
+pip install --upgrade openvino-genai pillow
+```
+- Download a test image: `curl -O "https://storage.openvinotoolkit.org/test_data/images/dog.jpg"`
+- Run inference:
+```python
+import numpy as np
+import openvino as ov
+import openvino_genai
+from PIL import Image
+# CACHE_DIR caches the model the first time, so subsequent model loading will be faster
+pipeline_config = {"CACHE_DIR": "model_cache"}
+pipe = openvino_genai.VLMPipeline("Qwen2.5-VL-7B-Instruct-ov-nf4-npu", "NPU", **pipeline_config)
+image = Image.open("dog.jpg")
+# optional: resizing to a smaller size (depending on image and prompt) is often useful to speed up inference.
+image = image.resize((128, 128))
+image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
+image_data = ov.Tensor(image_data)
+prompt = "Can you describe the image?"
+result = pipe.generate(prompt, image=image_data, max_new_tokens=100)
+print(result.texts[0])
+```
+See [OpenVINO GenAI repository](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#performing-visual-language-text-generation)
+## Model export properties
+Model export command:
+```
+optimum-cli export openvino -m Qwen/Qwen2.5-VL-7B-Instruct --weight-format nf4 --group-size -1 --sym Qwen2.5-VL-7B-Instruct-ov-nf4-npu
+```
+### Framework versions
+```
+openvino         : 2025.3.0-19807-44526285f24-releases/2025/3
+nncf             : 2.18.0
+optimum_intel    : 1.26.0.dev0+bc13ae5
+optimum          : 1.27.0
+pytorch          : 2.7.1
+transformers     : 4.51.3
+```
+### LLM export properties
+```
+all_layers               : False
+awq                      : False
+backup_mode              : int8_asym
+compression_format       : dequantize
+gptq                     : False
+group_size               : -1
+ignored_scope            : []
+lora_correction          : False
+mode                     : nf4
+ratio                    : 1.0
+scale_estimation         : False
+sensitivity_metric       : weight_quantization_error
+```