tinychat 271M base

An autoregressive Transformer language model:

  • Embedding size: 1440
  • Layers: 12
  • Attention heads: 16 (8 kv heads)
  • Parameters: 271M
  • Context length: 512
  • Vocabulary size: 32,768

Trained for ~50 hours on a single RTX-3090, processing ~5B tokens from:

  • 5M FineWeb articles
  • 5M FineWeb-Edu articles
  • Simple English Wikipedia

More details here

Performance (0-shot)

tinychat gpt2 gpt2-medium
swag 47.3% 48.9% 56.3%
hellaswag 30.7% 29.0% 37.0%
openbookqa 32.8% 28.0% 30.6%
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kali-v/tinychat-271M-base

Finetunes
1 model

Datasets used to train kali-v/tinychat-271M-base

Collection including kali-v/tinychat-271M-base