Requesting README.md update on how to run the model on vLLM with tool-calling support

#11

by douglasrfaisal-gl - opened Sep 24

Discussion

douglasrfaisal-gl

Sep 24

Requesting to update the README.md so that users can run the model on vLLM with tool-calling.

I had the following problems when using the model on vLLM with tool-calling as instructed in the guide:

Currently there are no guide on which flags to use (perhaps you can add a link to https://qwen.readthedocs.io/en/latest/deployment/vllm.html#parsing-tool-calls)

Once I've tried the above link, the vLLM failed to yield a successful tool call with the following error message:

ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

The following workaround worked for me:

Obtain the chat template from the repository and extract the jinja content

git clone https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
python3 -c "import json,sys; print(json.load(open('./Qwen3-Omni-30B-A3B-Instruct/chat_template.json')).get('chat_template',''))" > chat_template.jinja

Run vllm serve with chat_template flag --chat_template ./chat_template.jinja

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment