Tokenizer Study | LLaMA 130M | BPE Tokenizer (LLaMA 2)

LLaMA 130M (Implementation: https://github.com/lmsdss/LayerNorm-Scaling)

Pre-Training: C4 [~2.054B tokens (BPE), ~2.00B tokens (SentencePiece)]

Tokenizer: BPE (LLaMA2 7B's Tokenizer: meta-llama/Llama-2-7b-hf)

Evals

Perplexity

  • BPE (LLaMA2): 23.04

Bits-per-byte

  • BPE (LLaMA2): 0.426

Checkpoints:

  • 80K steps (Local steps 10K; 8 grad accumulations)
    • Path: /model_10000

    • Evals:

      • Perplexity: 25.6822

      • Bits-per-byte: 0.4409

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train shikhar-srivastava/llama-130m-prenorm-train_c4_2B-tok_llama2

Collection including shikhar-srivastava/llama-130m-prenorm-train_c4_2B-tok_llama2

Evaluation results