Personal report
Thanks for this model!
In my personal tests using thinking mode (e.g., riddles and complex reasoning problems requiring precise numerical responses), the fine-tuned version performs better than the original Qwen3-30B v2507 . This is the first fine-tuned model I've tested that achieves this performance level. (just think longer in my bench).
I have a question: What is the original model?
It does not function correctly in native mode with tool calls in Open Web UI (where the model mixes tool calls, thinking, and responses).
The default mode resolves this issue but I prefer the native workflow: think β tool_call β think β response. The default mode consolidates all tool calls into a single 'think' tag (faster but less accurate for complex requests).
I have completely rewritten the Jinja template multiple times to enable proper functionality in native tool mode. The optimal configuration I achieved is no_think with native tool calls, or think mode with native tool calls where the response remains exclusively within the think tag (preventing output outside of it).
Has anyone else encountered this issue? I'm using Open Web UI alongside LM Studio.
the original model is Qwen3-30B-A3B-Instruct-2507, and we've noticed this issue with tool calling in thinking mode.
thanks for the feedback on the model! I'm glad you liked it and I'm so sorry for this
@Elsephire It is a derivative developed based on my merged model: Qwen3-30B-A3B-YOYO-V3. I'm delighted to see it become even better, but it's really disappointing that my contributions weren't mentioned at all.
I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model
I'll make sure to credit David and YOYO though. I used Claude to write the readme and I forgot to give it this extra context. I'm so sorry for both of them to not being credited, and I'm thankful for the feedback and support coming from everyone π
I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model
6 7
pls kill me /j