kali-v
/

tinychat-271M-base

Text Generation

Model card Files Files and versions

tinychat 271M base

An autoregressive Transformer language model:

Embedding size: 1440
Layers: 12
Attention heads: 16 (8 kv heads)
Parameters: 271M
Context length: 512
Vocabulary size: 32,768

Trained for ~50 hours on a single RTX-3090, processing ~5B tokens from:

5M FineWeb articles
5M FineWeb-Edu articles
Simple English Wikipedia

More details here

Performance (0-shot)

	tinychat	gpt2	gpt2-medium
swag	47.3%	48.9%	56.3%
hellaswag	30.7%	29.0%	37.0%
openbookqa	32.8%	28.0%	30.6%

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for kali-v/tinychat-271M-base

Finetunes

1 model

Datasets used to train kali-v/tinychat-271M-base

Collection including kali-v/tinychat-271M-base

tinychat

2 items • Updated Sep 4