Main revision
#5
by
aldakata
- opened
Hi!
I was playing around with the revisions and I get different results with the main and the stage2-ingredient3-step23852-tokens51B revision.
Shouldn't these be the exact same model according to https://github.com/allenai/OLMo?
For the 1B model, we have trained three times with different data order on 50B high quality tokens, used last checkpoint of seed 42 as final checkpoint.