Sweaterdog/Smol-reason2.1
3B
•
Updated
•
104
My first ever usage of GRPO fine tuning techniques, information learned from this model will be used on future Andy models.
Note Datasets for the Smol-reason family