Model Card for starmpcc/NoPE_1.5B_FW_EDU_15T
This model is the official checkpoint accompanying the paper Behind RoPE: How Does Causal Mask Encode Positional Information?.
The model is trained without any explicit positional encoding (also known as NoPE).
It is based on the Llama-3 architecture, has 1.5 billion parameters, and was trained on 15 trillion tokens from the FineWeb-Edu dataset.
Model Training
The model is based on the Llama-3 architecture, with the positional encoding (RoPE) removed. It is trained on the deduplicated version of fineweb-edu dataset. The model has 1.5 billion parameters and is trained on 15 trillion tokens with a maximum sequence length of 1024. Further training details are provided in the accompanying paper.
Model Sources
Uses
from transformers import LlamaForCausalLM, LlamaTokenizer
import transformers.models.llama.modeling_llama as modeling_llama
def noop_apply_rotary_pos_emb(q, k, *args, **kwargs):
return q, k
modeling_llama.apply_rotary_pos_emb = noop_apply_rotary_pos_emb
model = LlamaForCausalLM.from_pretrained(
"starmpcc/NoPE_1.5B_FW_EDU_15T",
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = LlamaTokenizer.from_pretrained("starmpcc/NoPE_1.5B_FW_EDU_15T")
Citation
@misc{kim2025ropedoescausalmask,
title={Behind RoPE: How Does Causal Mask Encode Positional Information?},
author={Junu Kim and Xiao Liu and Zhenghao Lin and Lei Ji and Yeyun Gong and Edward Choi},
year={2025},
eprint={2509.21042},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.21042},
}
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support