Serve with vLLM

#1
by faheemraza1 - opened

Can this be run with vllm on RTX 3090?

faheemraza1 changed discussion title from Serve with vLLLM to Serve with vLLM
faheemraza1 changed discussion status to closed
This comment has been hidden (marked as Off-Topic)
faheemraza1 changed discussion status to open

I'm facing following error on RTX 3090 with vLLM.

KeyError: 'layers.49.self_attn.qkv_proj.k_scale'

Any help?

在使用GPTQ量化MOE模型中请注意要开启fail_safe=True,然而目前好像vllm未修复fp4-moe问题

It now works fine with vLLM v0.12.0

Sign up or log in to comment