Usable Models
Collection
5 items
β’
Updated
β’
2
Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.
Some initial benchmark results:
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| hellaswag | 0 | acc | 0.6621 | Β± | 0.0047 |
| acc_norm | 0.8525 | Β± | 0.0035 | ||
| arc_challenge | 0 | acc | 0.6348 | Β± | 0.0141 |
| acc_norm | 0.6698 | Β± | 0.0137 | ||
| winogrande | 0 | acc | 0.7861 | Β± | 0.0115 |
| gsm8k | 0 | acc | 0.5694 | Β± | 0.0136 |