FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! โข 44 items โข Updated Oct 17, 2024 โข 76
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper โข 2312.11514 โข Published Dec 12, 2023 โข 260