Serve with vLLM
#1
by
faheemraza1
- opened
Can this be run with vllm on RTX 3090?
faheemraza1
changed discussion title from
Serve with vLLLM
to Serve with vLLM
faheemraza1
changed discussion status to
closed
This comment has been hidden (marked as Off-Topic)
faheemraza1
changed discussion status to
open
I'm facing following error on RTX 3090 with vLLM.
KeyError: 'layers.49.self_attn.qkv_proj.k_scale'
Any help?
在使用GPTQ量化MOE模型中请注意要开启fail_safe=True,然而目前好像vllm未修复fp4-moe问题
It now works fine with vLLM v0.12.0