GGUF
conversational

Model idea

#1
by Enderchef - opened

It'd be cool if you fine tuned all these datasets onto Qwen/Qwen3-4B-Thinking-2507

Owner

Hello @Enderchef ,

I started the fine tuning process, it should not take that long, I will reply here once it's published.

Owner

Hey @Enderchef ,

The model is ready, please let me know if this matches your expectations.
Download it here.

Would you share the source code how to fine tune it?

Owner

Hello @Hamora,

I uploaded the safetensor files of this model here, the model card includes a link to an unsloth Jupyter notebook you can use to fine-tune it.

Eh, so this isn't RLHF tuning, just SFT ?
Or because you use the reasoning model as base so you just direct the fine tuning in SFT method?

@Liontix How do you think a 30b-a3b Moe fine-tune would work?

Owner

Hello @PSM24

It's technically possible, but on my current setup it doesn't work because it lacks the required VRAM. But I am experimenting with other smaller MoE models as they may perform significantly better than regular sized models on consumer hardware.

Here is the thing that you must remember
If the behavior of the dataset is different, you can not just apply SFT training like.
I've been test all of your model that using 2.5 pro datasets. Does it think? Yes, but it think just 'like' how gemini thinks, not really follows how gemini thinks. If the format of the reasoning is different, u have to do so for the training, it must be different also in RLHF tuning. You have to set up the certain special objects, such as how you design the special reward for the model, which is choosen, or which is think path is rejected

Sign up or log in to comment