Safetensors
English
llama

Model Description

This model is a refined version of the original EAGLE-3 model, trained with several key improvements.

Features

  • Attention Mask Fix: Addresses an issue with the attention mask found in the original EAGLE repository. Further details are available in this pull request.
  • Positional Embedding Alignment: The rope_theta has been set to 500,000 to align with the Llama-3.1-8B-Instruct model, correcting a mismatch from the original training setting (10,000).
  • Extended Context Length: The model was trained on data with a sequence length of 4096, an increase from the original 2048. Additionally, max_position_embeddings is set to 128,000 to facilitate further pretraining on long contexts.
  • Training Framework: The model was trained using the SpecForge library.

Benchmarks

Performance was evaluated using the SpecForge benchmark suite.

Checkpoint MT-Bench GSM8K HumanEval
Original 5.690 6.145 6.817
This work 5.999 6.221 6.804

Training Details

  • Epochs: 15
  • RoPE Theta: 500,000
  • Batch Size: 1
  • Learning Rate: 5e-5
  • Max Sequence Length: 4096
Downloads last month
7
Safetensors
Model size
1.0B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jialefu/eagle3-llama-3.1-8b-instruct

Finetuned
(1940)
this model

Datasets used to train jialefu/eagle3-llama-3.1-8b-instruct