Update: Decoder-cdlm-llada-12epoch

Browse files

Files changed (3) hide show

README.md +38 -3
adapter_config.json +45 -0
adapter_model.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,38 @@
----
-license: mit
----

+# CDLM-LLaDA LoRA adapter for LLaDA-8B-Instruct
+This repository hosts the LoRA adapter for the LLaDA-8B-Instruct diffusion LLM (dLLM), produced with the CDLM (Consistency Diffusion Language Models) method. CDLM integrates consistency modeling and a block-wise causal attention mask so the student model becomes fully KV-cache compatible while retaining the strong local bidirectional modeling within each block. In practice, the adapter enables significantly faster inference with competitive quality.
+- GitHub: https://github.com/minseo25/CDLM
+- Paper: TBA
+## Model details
+- Base model: GSAI-ML/LLaDA-8B-Instruct
+- Method: CDLM (consistency distillation + block-wise causal masking for KV-cache compatibility)
+- Format: PEFT LoRA adapter (`adapter_model.safetensors`, `adapter_config.json`)
+- Intended use: attach this adapter to the base LLaDA-8B-Instruct model for accelerated inference via the CDLM decoding path
+## How to use
+This is a LoRA adapter, not a full model. You must load the base model and then attach this adapter. For best speedups, use the CDLM inference path in the accompanying codebase.
+## License
+This adapter is released under the MIT License. The base model is governed by its own license; please ensure compliance with the base model’s terms.
+## Citation
+We will update the BibTeX below once the arXiv link is public.
+```bibtex
+@article{cdlm2025,
+  title   = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
+  author  = {TBA},
+  journal = {arXiv preprint},
+  year    = {2025},
+  eprint  = {TBA},
+}
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "LLaDAModelLM",
+    "parent_library": "model.modeling_llada"
+  },
+  "base_model_name_or_path": "GSAI-ML/LLaDA-8B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "q_proj",
+    "ff_out",
+    "ff_proj",
+    "attn_out",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76d7499b5d7201496e51744c91983e9fd7927f506fee482bf8ea94b4a6909413
+size 352318936