Endoftext spam at the end of every request

#2
by rageltman - opened

Running with candle-vllm or vllm.rs at q8_0 we're seeing this effect at the end of every response:

... or include additional safety checks?<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>....

On the code-gen side, does seem to do a reasonable job producing logical output but unfortunately can't do anything turn-based when it never completes its response (just maxes the output window with ^^)

Kwaipilot org

Thank you for your interest!
We’ll be releasing an official quantized version soon — stay tuned!

Sign up or log in to comment