DPO fine tunes
Collection
3 items
β’
Updated
This model is a finetune of macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo
Available here
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| SOLAR-10.7b-Instruct-truthy-dpo | 48.69 | 73.82 | 76.81 | 45.71 | 61.26 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.95 | Β± | 2.82 |
| acc_norm | 27.95 | Β± | 2.82 | ||
| agieval_logiqa_en | 0 | acc | 42.40 | Β± | 1.94 |
| acc_norm | 42.24 | Β± | 1.94 | ||
| agieval_lsat_ar | 0 | acc | 25.65 | Β± | 2.89 |
| acc_norm | 23.91 | Β± | 2.82 | ||
| agieval_lsat_lr | 0 | acc | 54.12 | Β± | 2.21 |
| acc_norm | 54.51 | Β± | 2.21 | ||
| agieval_lsat_rc | 0 | acc | 69.89 | Β± | 2.80 |
| acc_norm | 69.89 | Β± | 2.80 | ||
| agieval_sat_en | 0 | acc | 80.10 | Β± | 2.79 |
| acc_norm | 80.10 | Β± | 2.79 | ||
| agieval_sat_en_without_passage | 0 | acc | 50.00 | Β± | 3.49 |
| acc_norm | 49.51 | Β± | 3.49 | ||
| agieval_sat_math | 0 | acc | 42.27 | Β± | 3.34 |
| acc_norm | 41.36 | Β± | 3.33 |
Average: 48.69%
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| arc_challenge | 0 | acc | 59.90 | Β± | 1.43 |
| acc_norm | 63.91 | Β± | 1.40 | ||
| arc_easy | 0 | acc | 80.85 | Β± | 0.81 |
| acc_norm | 78.16 | Β± | 0.85 | ||
| boolq | 1 | acc | 88.20 | Β± | 0.56 |
| hellaswag | 0 | acc | 68.34 | Β± | 0.46 |
| acc_norm | 86.39 | Β± | 0.34 | ||
| openbookqa | 0 | acc | 37.60 | Β± | 2.17 |
| acc_norm | 46.80 | Β± | 2.23 | ||
| piqa | 0 | acc | 78.84 | Β± | 0.95 |
| acc_norm | 78.78 | Β± | 0.95 | ||
| winogrande | 0 | acc | 74.51 | Β± | 1.22 |
Average: 73.82%
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| truthfulqa_mc | 1 | mc1 | 61.81 | Β± | 1.70 |
| mc2 | 76.81 | Β± | 1.42 |
Average: 76.81%
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| bigbench_causal_judgement | 0 | multiple_choice_grade | 50.53 | Β± | 3.64 |
| bigbench_date_understanding | 0 | multiple_choice_grade | 63.14 | Β± | 2.51 |
| bigbench_disambiguation_qa | 0 | multiple_choice_grade | 47.67 | Β± | 3.12 |
| bigbench_geometric_shapes | 0 | multiple_choice_grade | 26.18 | Β± | 2.32 |
| exact_str_match | 0.00 | Β± | 0.00 | ||
| bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 28.60 | Β± | 2.02 |
| bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 21.29 | Β± | 1.55 |
| bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 47.33 | Β± | 2.89 |
| bigbench_movie_recommendation | 0 | multiple_choice_grade | 39.80 | Β± | 2.19 |
| bigbench_navigate | 0 | multiple_choice_grade | 63.80 | Β± | 1.52 |
| bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 59.05 | Β± | 1.10 |
| bigbench_ruin_names | 0 | multiple_choice_grade | 40.18 | Β± | 2.32 |
| bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 46.69 | Β± | 1.58 |
| bigbench_snarks | 0 | multiple_choice_grade | 65.19 | Β± | 3.55 |
| bigbench_sports_understanding | 0 | multiple_choice_grade | 72.41 | Β± | 1.42 |
| bigbench_temporal_sequences | 0 | multiple_choice_grade | 60.30 | Β± | 1.55 |
| bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 25.76 | Β± | 1.24 |
| bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 17.43 | Β± | 0.91 |
| bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 47.33 | Β± | 2.89 |
Average: 45.71%
Average score: 61.26%
Elapsed time: 02:16:03
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 74.11 |
| AI2 Reasoning Challenge (25-Shot) | 72.10 |
| HellaSwag (10-Shot) | 88.44 |
| MMLU (5-Shot) | 65.45 |
| TruthfulQA (0-shot) | 76.75 |
| Winogrande (5-shot) | 82.72 |
| GSM8k (5-shot) | 59.21 |