Conflicting sampling parameter recommendations?

by ayylmaonade - opened 16 days ago

16 days ago

•

Hey! Firstly, thanks for the great work with the UD quants, as always. The Q4_K_XL GGUF is running great at 96K context on my 7900 XTX via llama.cpp w/ Vulkan.

However I've got some questions regarding sampling params, specifically temperature. On your guys' site, the Qwen3-VL guide states to set a temperature of 1.0 for 'Thinking' variants of this model series. But when you click the source, the Qwen3-VL git seems to recommend a temp of 0.6 instead. I wanted to see if I could get some clarification on this if possible?

I've been testing both 0.6 and 1.0. Using temp=1 has caused a few issues for me. I've had one instance of chinese characters showing up, and a few weird cases where the model contains its chain-of-thought outside of the tag. For reference, I'm using the other recommended params;

top_p=0.95
top_k=20
repeat_penalty=1.0

Although, to reduce excessive thinking, I am using presence_penalty=1.5 as was recommended for non-VL Qwen3-Thinking models. I was also wondering if min_p should be set to 0.0 or kept at 0.01? (the llama.cpp default as far as I can tell)

Once again, appreciate all the awesome work you guys do! UD quants are my go-to, and I'd appreciate any advice on this.

Cheers :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment