ZHANG Mingxing's picture

1

ZHANG Mingxing

zhang-mingxing

·

https://madsys.cs.tsinghua.edu.cn/~zhangmx/

james0zan

AI & ML interests

None yet

Recent Activity

authored a paper about 2 months ago

Efficient and Economic Large Language Model Inference with Attention Offloading

authored a paper about 2 months ago

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

authored a paper about 2 months ago

MoBA: Mixture of Block Attention for Long-Context LLMs

View all activity

Organizations

authored 4 papers about 2 months ago

Efficient and Economic Large Language Model Inference with Attention Offloading

Paper • 2405.01814 • Published May 3, 2024

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Paper • 2407.00079 • Published Jun 24, 2024 • 5

MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Paper • 2505.02922 • Published May 5 • 28