CDLM-LLaDA LoRA adapter for LLaDA-8B-Instruct
This repository hosts the LoRA adapter for the LLaDA-8B-Instruct diffusion LLM (dLLM), produced with the CDLM (Consistency Diffusion Language Models) method. CDLM integrates consistency modeling and a block-wise causal attention mask so the student model becomes fully KV-cache compatible while retaining the strong local bidirectional modeling within each block. In practice, the adapter enables significantly faster inference with competitive quality.
- GitHub: https://github.com/SqueezeAILab/CDLM
- Paper: CDLM: Consistency Diffusion Language Models For Faster Sampling
Model details
- Base model: GSAI-ML/LLaDA-8B-Instruct
- Method: CDLM (consistency distillation + block-wise causal masking for KV-cache compatibility)
- Format: PEFT LoRA adapter (
adapter_model.safetensors,adapter_config.json) - Intended use: attach this adapter to the base LLaDA-8B-Instruct model for accelerated inference via the CDLM decoding path
How to use
This is a LoRA adapter, not a full model. You must load the base model and then attach this adapter. For best speedups, use the CDLM inference path in the accompanying codebase.
License
This adapter is released under the MIT License. The base model is governed by its own license; please ensure compliance with the base model’s terms.
Citation
@article{kim2025cdlm,
title = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
author = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman
and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir},
journal = {arXiv preprint arXiv:2511.19269},
year = {2025},
url = {https://arxiv.org/abs/2511.19269}
}
- Downloads last month
- 105