|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- mlfoundations/dclm-baseline-1.0-parquet |
|
|
--- |
|
|
|
|
|
# Covenant72B |
|
|
|
|
|
**Covenant72B** is the largest permissionless collaboratively trained language |
|
|
model trained entirely from scratch at the 72 billion parameter scale. |
|
|
|
|
|
It is being trained with 20+ globally distributed participants coordinated via |
|
|
decentralized infrastructure on the Bittensor blockchain. |
|
|
|
|
|
**Checkpoint-One** marks the first release, corresponding to **200 billion |
|
|
tokens processed**. Model files are available in the [Checkpoint-One |
|
|
branch](https://huggingface.co/tplr/Covenant72B/tree/Checkpoint-One). Future |
|
|
checkpoints will be updated here. |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
| Property | Value | |
|
|
|-----------|--------| |
|
|
| **Model size** | 72B | |
|
|
| **Architecture** | LLaMA-style | |
|
|
| **Target token budget** | 1.2T (210B for current checkpoint) | |
|
|
| **Compute participants** | 20+ | |
|
|
| **Minimal compute per participant** | 8×B200 or equivalent | |
|
|
| **Dataset** | DCLM-baseline | |
|
|
| **Optimizer** | SparseLoCo (communication-efficient optimizer) | |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance on Benchmarks |
|
|
_All results are 0-shot acc-norm (%)_ |
|
|
|
|
|
| Model | Compute Environment / Permissions | Size | Tokens | ARC-C | ARC-E | PIQA | OpenBookQA | HellaSwag | Winogrande | MMLU | |
|
|
|:------|:----------------------------------|------:|--------:|------:|------:|------:|------------:|-----------:|-------------:|------:| |
|
|
| **Intellect-1** | Over the internet / White List | 10B | 1T | 44.8 | 71.6 | 77.7 | 43.6 | 70.5 | 63.1 | 32.7 | |
|
|
| **Psyche Consilience-7Y9** | Over the internet / White List | 40B | 1.2T | 31.1 | 55.8 | 76.1 | 34.8 | 63.7 | 57.0 | 24.2 | |
|
|
| **Covenant72B – Checkpoint One** | Over the internet / Permissionless | 70B | 210B | 46.2 | 72.6 | 79.2 | 43.0 | 73.5 | 70.3 | 38.0 | |
|
|
| **K2 Checkpoint 54** | Centralized Cluster | 65B | 210B | 41.8 | 69.5 | 80.1 | 42.4 | 74.9 | 68.9 | 33.7 | |
|
|
|
|
|
--- |
|
|
|
|
|
For more details, refer to [Checkpoint One on Templar Research](https://templarresearch.substack.com/p/checkpoint-one). |
|
|
|