Conflicting sampling parameter recommendations?
Hey! Firstly, thanks for the great work with the UD quants, as always. The Q4_K_XL GGUF is running great at 96K context on my 7900 XTX via llama.cpp w/ Vulkan.
However I've got some questions regarding sampling params, specifically temperature. On your guys' site, the Qwen3-VL guide states to set a temperature of 1.0 for 'Thinking' variants of this model series. But when you click the source, the Qwen3-VL git seems to recommend a temp of 0.6 instead. I wanted to see if I could get some clarification on this if possible?
I've been testing both 0.6 and 1.0. Using temp=1 has caused a few issues for me. I've had one instance of chinese characters showing up, and a few weird cases where the model contains its chain-of-thought outside of the tag. For reference, I'm using the other recommended params;
- top_p=0.95
- top_k=20
- repeat_penalty=1.0
Although, to reduce excessive thinking, I am using presence_penalty=1.5 as was recommended for non-VL Qwen3-Thinking models. I was also wondering if min_p should be set to 0.0 or kept at 0.01? (the llama.cpp default as far as I can tell)
Once again, appreciate all the awesome work you guys do! UD quants are my go-to, and I'd appreciate any advice on this.
Cheers :)