glm looping on \n

#6
by chgrdj - opened

Hey guys ,

Been spinning up GLM-4.5-air lately and i make him generate some structured output. Sometimes (not constantly) it just gets stuck after one of the field names generating '\n' in loop

For inference parameters i use :

{"extra_body": {'repetition_penalty': 1.05,'length_penalty': 1.05}}

{"temperature": 0.6, "top_p": 0.95,"max_tokens": 16384}
I use vllm

Anyone encountered such issue or has an idea?

Thx!

It seems you're using an incorrect vLLM version. Try using the latest version. Are you using the version with FP8? And what's your TP?

Hey thanks for the answer!
I'm using vllm==0.10.1.1 , will try 10.2
I use fp8 version, yeah I expect quantization do not help in those issues.
About Tensor parallelism i use simply one H200 with 64k of context .
Don't get me wrong in most of cases the model is doing a great job, just its shame sometimes he gets stuck in a loop like that .

If you're using the FP8 version, you should definitely update and try it out, as this version had some bugs that have been fixed in version 0.10.2.

Oh nice i did not get the chance to try yet but will keep you updated !

Sign up or log in to comment