Repetitions, repetitions everywhere with top-p: 0.95

#2
by mratsim - opened

Hey, thank you for your models.

I actually spent the past 5 days raising issues on vllm and llmcompressor tracker to try to quantize your v1 (to prepare for GLM-4.6-Air drop soon™) and right in the middle of quantization you drop v2.

Anyway, I have noticed a weird behavior on the model if top-p is 0.95, whether temp is 0.8 or 0.9, I get repetition in either thinking or no thinking very easily (first reply!)
image

This didn't happen in the v1.

This is fixed by unsetting top-p (or setting it to 1). No action needed on your end, this is just informational in case others report the same to you.

This is when running with vllm in Chat Completions mode so I can't do backend reordering like in traditional SillyTavern (in Text Completions mode it's somewhat better).

Sign up or log in to comment