---
library_name: transformers
license: apache-2.0
datasets:
- wubingheng/Doge_PT_chinese
language:
- zh
pipeline_tag: text-generation
tags:
- pt
- doge
---
# **Doge 20M CN**
Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the [small-doge](https://github.com/SmallDoges/small-doge) repository.
## Uses
```python
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("wubingheng/Doge-20M-Chinese")
>>> model = AutoModelForCausalLM.from_pretrained("wubingheng/Doge-20M-Chinese", trust_remote_code=True)
>>> inputs = tokenizer("你好", return_tensors="pt")
>>> out = model.generate(**inputs, max_new_tokens=100)
>>> print(tokenizer.batch_decode(out))
```
## Model Details
[
](https://wandb.ai/loser_cheems/huggingface/runs/gopufefk?nw=nwuserbinghengwu)
**Environment**:
- Image: nvcr.io/nvidia/pytorch:24.12-py3
- Hardware: 1x NVIDIA RTX 4090
- Software: Transformers
## Citation
```bibtex
@misc{smalldoges,
title={SmallDoges: A Family of Dynamic UltraFast Small Language Models},
author={Jingze, Shi and Yifan, Wu and Bingheng, Wu and Yuyu, Luo},
year={2025},
month={March},
url={https://github.com/SmallDoges/small-doge}
}
```