nbeerbower
/

mistral-nemo-gutenberg2-12B-test

Text Generation

text-generation-inference

Model card Files Files and versions

mistral-nemo-gutenberg2-12B-test

mistralai/Mistral-Nemo-Instruct-2407 finetuned on nbeerbower/gutenberg2-dpo.

This model is a test for the sake of benchmarking my gutenberg2 dataset.

Method

Finetuned using an RTX 3090 for 3 epochs.

Fine-tune Llama 3 with ORPO

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	20.73
IFEval (0-Shot)	33.85
BBH (3-Shot)	32.04
MATH Lvl 5 (4-Shot)	10.20
GPQA (0-shot)	8.95
MuSR (0-shot)	10.97
MMLU-PRO (5-shot)	28.39

Downloads last month: 6

Safetensors

Model size

12B params

Tensor type

BF16

·

Model tree for nbeerbower/mistral-nemo-gutenberg2-12B-test

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Finetuned

(143)

this model

Quantizations

Dataset used to train nbeerbower/mistral-nemo-gutenberg2-12B-test

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

33.850
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

32.040
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

10.200
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

8.950
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

10.970
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

28.390

View on Papers With Code