Did someone get it running on 4 NVIDIA RTX PRO 6000 Blackwell (96 GB) GPUs?

by FabianHeller - opened 10 days ago

10 days ago

I have 4 NVIDIA RTX PRO 6000 Blackwell GPUs, does it fit? If yes, how much KV cache is left? If yes, what settings did you use? I am running at the moment GLM-4.6-FP8 with SGLang, there I still get 160k context length with FP8 KV cache quantization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment