--- library_name: transformers license: apache-2.0 datasets: - wubingheng/Doge_PT_chinese language: - zh pipeline_tag: text-generation tags: - pt - doge --- # **Doge 20M CN**

Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the [small-doge](https://github.com/SmallDoges/small-doge) repository. ## Uses ```python >>> from transformers import AutoTokenizer, AutoModelForCausalLM >>> tokenizer = AutoTokenizer.from_pretrained("wubingheng/Doge-20M-Chinese") >>> model = AutoModelForCausalLM.from_pretrained("wubingheng/Doge-20M-Chinese", trust_remote_code=True) >>> inputs = tokenizer("你好", return_tensors="pt") >>> out = model.generate(**inputs, max_new_tokens=100) >>> print(tokenizer.batch_decode(out)) ``` ## Model Details [

](https://wandb.ai/loser_cheems/huggingface/runs/gopufefk?nw=nwuserbinghengwu) **Environment**: - Image: nvcr.io/nvidia/pytorch:24.12-py3 - Hardware: 1x NVIDIA RTX 4090 - Software: Transformers ## Citation ```bibtex @misc{smalldoges, title={SmallDoges: A Family of Dynamic UltraFast Small Language Models}, author={Jingze, Shi and Yifan, Wu and Bingheng, Wu and Yuyu, Luo}, year={2025}, month={March}, url={https://github.com/SmallDoges/small-doge} } ```