Did you use llm-compressor in vllm or something else?
how do you run this model? cant with vllm / sglang
moe nvfp4 only seems to work with tensorrt-llm
can you share the quant script for this model please 🙏
· Sign up or log in to comment