Model Description
This model is a refined version of the original EAGLE-3 model, trained with several key improvements.
Features
- Attention Mask Fix: Addresses an issue with the attention mask found in the original EAGLE repository. Further details are available in this pull request.
- Positional Embedding Alignment: The
rope_thetahas been set to 500,000 to align with the Llama-3.1-8B-Instruct model, correcting a mismatch from the original training setting (10,000). - Extended Context Length: The model was trained on data with a sequence length of 4096, an increase from the original 2048. Additionally,
max_position_embeddingsis set to 128,000 to facilitate further pretraining on long contexts. - Training Framework: The model was trained using the SpecForge library.
Benchmarks
Performance was evaluated using the SpecForge benchmark suite.
| Checkpoint | MT-Bench | GSM8K | HumanEval |
|---|---|---|---|
| Original | 5.690 | 6.145 | 6.817 |
| This work | 5.999 | 6.221 | 6.804 |
Training Details
- Epochs: 15
- RoPE Theta: 500,000
- Batch Size: 1
- Learning Rate: 5e-5
- Max Sequence Length: 4096
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for jialefu/eagle3-llama-3.1-8b-instruct
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct