Repetitions, repetitions everywhere with top-p: 0.95
Hey, thank you for your models.
I actually spent the past 5 days raising issues on vllm and llmcompressor tracker to try to quantize your v1 (to prepare for GLM-4.6-Air drop soon™) and right in the middle of quantization you drop v2.
Anyway, I have noticed a weird behavior on the model if top-p is 0.95, whether temp is 0.8 or 0.9, I get repetition in either thinking or no thinking very easily (first reply!)
This didn't happen in the v1.
This is fixed by unsetting top-p (or setting it to 1). No action needed on your end, this is just informational in case others report the same to you.
This is when running with vllm in Chat Completions mode so I can't do backend reordering like in traditional SillyTavern (in Text Completions mode it's somewhat better).