--- base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- This is the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) model, converted to OpenVINO, with nf4 weights for the language model, int8 weights for the other models. The nf4 weights are compressed with symmetric, channel-wise quantization. The model works on Intel NPU. See below for the model export command/properties. ## Download Model To download the model, run `pip install huggingface-hub[cli]` and then: ``` huggingface-cli download helenai/Qwen2.5-VL-7B-Instruct-ov-nf4-npu --local-dir Qwen2.5-VL-7B-Instruct-ov-nf4-npu ``` ## Run inference with OpenVINO GenAI Use OpenVINO GenAI to run inference on this model. This model works with OpenVINO GenAI 2025.3 and later. Make sure to use the latest NPU driver ([Windows](https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html), [Linux](https://github.com/intel/linux-npu-driver)) - Install OpenVINO GenAI and pillow: ``` pip install --upgrade openvino-genai pillow ``` - Download a test image: `curl -O "https://storage.openvinotoolkit.org/test_data/images/dog.jpg"` - Run inference: ```python import numpy as np import openvino as ov import openvino_genai from PIL import Image # CACHE_DIR caches the model the first time, so subsequent model loading will be faster pipeline_config = {"CACHE_DIR": "model_cache"} pipe = openvino_genai.VLMPipeline("Qwen2.5-VL-7B-Instruct-ov-nf4-npu", "NPU", **pipeline_config) image = Image.open("dog.jpg") # optional: resizing to a smaller size (depending on image and prompt) is often useful to speed up inference. image = image.resize((128, 128)) image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8) image_data = ov.Tensor(image_data) prompt = "Can you describe the image?" result = pipe.generate(prompt, image=image_data, max_new_tokens=100) print(result.texts[0]) ``` See [OpenVINO GenAI repository](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#performing-visual-language-text-generation) ## Model export properties Model export command: ``` optimum-cli export openvino -m Qwen/Qwen2.5-VL-7B-Instruct --weight-format nf4 --group-size -1 --sym Qwen2.5-VL-7B-Instruct-ov-nf4-npu ``` ### Framework versions ``` openvino : 2025.3.0-19807-44526285f24-releases/2025/3 nncf : 2.18.0 optimum_intel : 1.26.0.dev0+bc13ae5 optimum : 1.27.0 pytorch : 2.7.1 transformers : 4.51.3 ``` ### LLM export properties ``` all_layers : False awq : False backup_mode : int8_asym compression_format : dequantize gptq : False group_size : -1 ignored_scope : [] lora_correction : False mode : nf4 ratio : 1.0 scale_estimation : False sensitivity_metric : weight_quantization_error ```