Repetitions, repetitions everywhere with top-p: 0.95

by mratsim - opened 5 days ago

5 days ago

•

Hey, thank you for your models.

I actually spent the past 5 days raising issues on vllm and llmcompressor tracker to try to quantize your v1 (to prepare for GLM-4.6-Air drop soon™) and right in the middle of quantization you drop v2.

Anyway, I have noticed a weird behavior on the model if top-p is 0.95, whether temp is 0.8 or 0.9, I get repetition in either thinking or no thinking very easily (first reply!)

This didn't happen in the v1.

This is fixed by unsetting top-p (or setting it to 1). No action needed on your end, this is just informational in case others report the same to you.

This is when running with vllm in Chat Completions mode so I can't do backend reordering like in traditional SillyTavern (in Text Completions mode it's somewhat better).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment