glm looping on \n
Hey guys ,
Been spinning up GLM-4.5-air lately and i make him generate some structured output. Sometimes (not constantly) it just gets stuck after one of the field names generating '\n' in loop
For inference parameters i use :
{"extra_body": {'repetition_penalty': 1.05,'length_penalty': 1.05}}
{"temperature": 0.6, "top_p": 0.95,"max_tokens": 16384}
I use vllm
Anyone encountered such issue or has an idea?
Thx!
It seems you're using an incorrect vLLM version. Try using the latest version. Are you using the version with FP8? And what's your TP?
Hey thanks for the answer!
I'm using vllm==0.10.1.1 , will try 10.2
I use fp8 version, yeah I expect quantization do not help in those issues.
About Tensor parallelism i use simply one H200 with 64k of context .
Don't get me wrong in most of cases the model is doing a great job, just its shame sometimes he gets stuck in a loop like that .
If you're using the FP8 version, you should definitely update and try it out, as this version had some bugs that have been fixed in version 0.10.2.
Oh nice i did not get the chance to try yet but will keep you updated !