Tokenizer Study
Collection
Models comparing the effects of tokenizer properties on pre-training compression, and its relationship with downstream performance.
•
84 items
•
Updated
•
3
LLaMA 130M (Implementation: https://github.com/lmsdss/LayerNorm-Scaling)
Pre-Training: C4 [~2.054B tokens (BPE), ~2.00B tokens (SentencePiece)]
Tokenizer: BPE (LLaMA2 7B's Tokenizer: meta-llama/Llama-2-7b-hf)
Perplexity
Bits-per-byte
Checkpoints:
Path: /model_10000
Evals:
Perplexity: 25.6822
Bits-per-byte: 0.4409