Did someone get it running on 4 NVIDIA RTX PRO 6000 Blackwell (96 GB) GPUs?

#2
by FabianHeller - opened

I have 4 NVIDIA RTX PRO 6000 Blackwell GPUs, does it fit? If yes, how much KV cache is left? If yes, what settings did you use? I am running at the moment GLM-4.6-FP8 with SGLang, there I still get 160k context length with FP8 KV cache quantization.

Sign up or log in to comment