Serve with vLLM

by faheemraza1 - opened Aug 30

Aug 30

Can this be run with vllm on RTX 3090?

faheemraza1 changed discussion title from Serve with vLLLM to Serve with vLLM Aug 30

faheemraza1 changed discussion status to closed Aug 31

Aug 31

This comment has been hidden (marked as Off-Topic)

faheemraza1 changed discussion status to open Aug 31

Aug 31

I'm facing following error on RTX 3090 with vLLM.

KeyError: 'layers.49.self_attn.qkv_proj.k_scale'

Any help?

Oct 31

在使用GPTQ量化MOE模型中请注意要开启fail_safe=True，然而目前好像vllm未修复fp4-moe问题

11 days ago

It now works fine with vLLM v0.12.0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment