Model idea
It'd be cool if you fine tuned all these datasets onto Qwen/Qwen3-4B-Thinking-2507
Hello @Enderchef ,
I started the fine tuning process, it should not take that long, I will reply here once it's published.
Hey @Enderchef ,
The model is ready, please let me know if this matches your expectations.
Download it here.
Would you share the source code how to fine tune it?
Eh, so this isn't RLHF tuning, just SFT ?
Or because you use the reasoning model as base so you just direct the fine tuning in SFT method?
Here is the thing that you must remember
If the behavior of the dataset is different, you can not just apply SFT training like.
I've been test all of your model that using 2.5 pro datasets. Does it think? Yes, but it think just 'like' how gemini thinks, not really follows how gemini thinks. If the format of the reasoning is different, u have to do so for the training, it must be different also in RLHF tuning. You have to set up the certain special objects, such as how you design the special reward for the model, which is choosen, or which is think path is rejected