image_token_id mismatch causes "Image features and image tokens do not match" error in OSS-20B model
Issue Description
When using the OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF model, inference fails with the following error:
-> ValueError: Image features and image tokens do not match: tokens: 0, features 3328
Root Cause
The issue stems from a token ID mismatch between the model configuration and the tokenizer:
- The model's
config.jsonspecifies"image_token_id": 151671 - However, the OSS-20B tokenizer actually maps
<IMG_CONTEXT>to token ID200021(as seen intokenizer_config.json) - The 14B model uses
151671for<IMG_CONTEXT>(in itsadded_tokens.json), which appears to have been carried over to the OSS-20B config
Workaround
Users can fix this by manually updating the image_token_id after loading the model:
model = AutoModelForImageTextToText.from_pretrained(model_name, ...)
model.config.image_token_id = 200021 # Correct token ID for OSS-20B
Suggested Fix
Update the model's config.json to use the correct image_token_id: 200021 to match the tokenizer configuration.
Additional Note
The OSS-20B model is missing the added_tokens.json file that exists in the 14B model, though this doesn't appear to cause issues as the tokens are defined in tokenizer_config.json.
🤗 Thank you for your interest and for pointing out the hidden bug as well as providing a detailed analysis. I have already updated the model’s config and verified that it now runs successfully.