Requesting README.md update on how to run the model on vLLM with tool-calling support

#11
by douglasrfaisal-gl - opened

Requesting to update the README.md so that users can run the model on vLLM with tool-calling.

I had the following problems when using the model on vLLM with tool-calling as instructed in the guide:

  1. Currently there are no guide on which flags to use (perhaps you can add a link to https://qwen.readthedocs.io/en/latest/deployment/vllm.html#parsing-tool-calls)
  2. Once I've tried the above link, the vLLM failed to yield a successful tool call with the following error message:
    ValueError: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
    

The following workaround worked for me:

  1. Obtain the chat template from the repository and extract the jinja content
    git clone https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
    python3 -c "import json,sys; print(json.load(open('./Qwen3-Omni-30B-A3B-Instruct/chat_template.json')).get('chat_template',''))" > chat_template.jinja
    
  2. Run vllm serve with chat_template flag --chat_template ./chat_template.jinja

Sign up or log in to comment