WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published 11 days ago • 17
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27 • 27
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24 • 40
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8 • 110