--- language: - multilingual license: mit license_link: https://huggingface.co/moonshotai/Kimi-Dev-72B/blob/main/LICENSE.md library_name: transformers pipeline_tag: text-generation tags: - GPTQ - Int8 - vLLM - code - swebench - software - issue-resolving base_model: - moonshotai/Kimi-Dev-72B base_model_relation: quantized --- # Kimi-Dev-72B-GPTQ-Int8 Base model: [moonshotai/Kimi-Dev-72B](https://huggingface.co/moonshotai/Kimi-Dev-72B) Calibrate using the https://huggingface.co/datasets/timdettmers/openassistant-guanaco/blob/main/openassistant_best_replies_eval.jsonl dataset.
The quantization configuration is as follows ``` quant_config = QuantizeConfig(bits=8, group_size=128, desc_act=False) ``` ### 【vLLM Startup Command】 ``` vllm serve JunHowie/Kimi-Dev-72B-GPTQ-Int8 ``` ### 【Model Download】 ```python from huggingface_hub import snapshot_download snapshot_download('JunHowie/Kimi-Dev-72B-GPTQ-Int8', cache_dir="your_local_path") ``` ### 【Overview】

Introducing Kimi-Dev:
A Strong and Open-source Coding LLM for Issue Resolution

Kimi-Dev Team

📄 Tech Report (Coming soon...) | 📄 Github

We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models. - Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models. - Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards. - Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.

Performance of Open-source Models on SWE-bench Verified.

## Quick Start ``` from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "moonshotai/Kimi-Dev-72B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Citation ``` @misc{kimi_dev_72b_2025, title = {Introducing Kimi-Dev: A Strong and Open-source Coding LLM for Issue Resolution}, author = {{Kimi-Dev Team}}, year = {2025}, month = {June}, url = {\url{https://www.moonshot.cn/Kimi-Dev}} } ```