baileyk burtenshaw HF Staff commited on
Commit
c571abe
·
verified ·
1 Parent(s): 09dbaaa

Extract evaluation results from README (#6)

Browse files

- Extract evaluation results from README (08f9b5b0b86eb542f4611c7bbdbee364ae87a188)
- Update README.md (43fd7ee03a438e39d1e304dfefe50e2a06545da4)


Co-authored-by: ben burtenshaw <[email protected]>

Files changed (1) hide show
  1. README.md +135 -0
README.md CHANGED
@@ -5,6 +5,141 @@ language:
5
  library_name: transformers
6
  datasets:
7
  - allenai/dolma3_mix-5.5T-1125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  ## Model Details
 
5
  library_name: transformers
6
  datasets:
7
  - allenai/dolma3_mix-5.5T-1125
8
+ model-index:
9
+ - name: Olmo-3-1125-32B
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ dataset:
14
+ name: Benchmarks
15
+ type: benchmark
16
+ metrics:
17
+ - name: Olmo 3-Eval Math
18
+ type: olmo_3_eval_math
19
+ value: 61.6
20
+ - name: BigCodeBench
21
+ type: bigcodebench
22
+ value: 43.9
23
+ - name: HumanEval
24
+ type: humaneval
25
+ value: 66.5
26
+ - name: DeepSeek LeetCode
27
+ type: deepseek_leetcode
28
+ value: 1.9
29
+ - name: DS 1000
30
+ type: ds_1000
31
+ value: 29.7
32
+ - name: MBPP
33
+ type: mbpp
34
+ value: 60.2
35
+ - name: MultiPL HumanEval
36
+ type: multipl_humaneval
37
+ value: 35.9
38
+ - name: MultiPL MBPPP
39
+ type: multipl_mbppp
40
+ value: 41.8
41
+ - name: Olmo 3-Eval Code
42
+ type: olmo_3_eval_code
43
+ value: 40.0
44
+ - name: ARC MC
45
+ type: arc_mc
46
+ value: 94.7
47
+ - name: MMLU STEM
48
+ type: mmlu_stem
49
+ value: 70.8
50
+ - name: MedMCQA MC
51
+ type: medmcqa_mc
52
+ value: 57.6
53
+ - name: MedQA MC
54
+ type: medqa_mc
55
+ value: 53.8
56
+ - name: SciQ MC
57
+ type: sciq_mc
58
+ value: 95.5
59
+ - name: Olmo 3-Eval MC_STEM
60
+ type: olmo_3_eval_mc_stem
61
+ value: 74.5
62
+ - name: MMLU Humanities
63
+ type: mmlu_humanities
64
+ value: 78.3
65
+ - name: MMLU Social Sci.
66
+ type: mmlu_social_sci.
67
+ value: 83.9
68
+ - name: MMLU Other
69
+ type: mmlu_other
70
+ value: 75.1
71
+ - name: CSQA MC
72
+ type: csqa_mc
73
+ value: 82.3
74
+ - name: PIQA MC
75
+ type: piqa_mc
76
+ value: 85.6
77
+ - name: SocialIQA MC
78
+ type: socialiqa_mc
79
+ value: 83.9
80
+ - name: CoQA Gen2MC MC
81
+ type: coqa_gen2mc_mc
82
+ value: 96.4
83
+ - name: DROP Gen2MC MC
84
+ type: drop_gen2mc_mc
85
+ value: 87.2
86
+ - name: Jeopardy Gen2MC MC
87
+ type: jeopardy_gen2mc_mc
88
+ value: 92.3
89
+ - name: NaturalQs Gen2MC MC
90
+ type: naturalqs_gen2mc_mc
91
+ value: 78.0
92
+ - name: SQuAD Gen2MC MC
93
+ type: squad_gen2mc_mc
94
+ value: 98.2
95
+ - name: Olmo 3-Eval MC_Non-STEM
96
+ type: olmo_3_eval_mc_non_stem
97
+ value: 85.6
98
+ - name: HellaSwag RC
99
+ type: hellaswag_rc
100
+ value: 84.8
101
+ - name: Winogrande RC
102
+ type: winogrande_rc
103
+ value: 90.3
104
+ - name: Lambada
105
+ type: lambada
106
+ value: 75.7
107
+ - name: Basic Skills
108
+ type: basic_skills
109
+ value: 93.5
110
+ - name: DROP
111
+ type: drop
112
+ value: 81.0
113
+ - name: Jeopardy
114
+ type: jeopardy
115
+ value: 75.3
116
+ - name: NaturalQs
117
+ type: naturalqs
118
+ value: 48.7
119
+ - name: SQuAD
120
+ type: squad
121
+ value: 94.5
122
+ - name: CoQA
123
+ type: coqa
124
+ value: 74.1
125
+ - name: Olmo 3-Eval GenQA
126
+ type: olmo_3_eval_genqa
127
+ value: 79.8
128
+ - name: BBH
129
+ type: bbh
130
+ value: 77.6
131
+ - name: MMLU Pro MC
132
+ type: mmlu_pro_mc
133
+ value: 49.6
134
+ - name: Deepmind Math
135
+ type: deepmind_math
136
+ value: 30.1
137
+ - name: LBPP
138
+ type: lbpp
139
+ value: 21.7
140
+ source:
141
+ name: Model README
142
+ url: https://huggingface.co/allenai/Olmo-3-1125-32B
143
  ---
144
 
145
  ## Model Details