Personal report

by Elsephire - opened 1 day ago

1 day ago

Thanks for this model!

In my personal tests using thinking mode (e.g., riddles and complex reasoning problems requiring precise numerical responses), the fine-tuned version performs better than the original Qwen3-30B v2507 . This is the first fine-tuned model I've tested that achieves this performance level. (just think longer in my bench).

I have a question: What is the original model?

It does not function correctly in native mode with tool calls in Open Web UI (where the model mixes tool calls, thinking, and responses).
The default mode resolves this issue but I prefer the native workflow: think → tool_call → think → response. The default mode consolidates all tool calls into a single 'think' tag (faster but less accurate for complex requests).

I have completely rewritten the Jinja template multiple times to enable proper functionality in native tool mode. The optimal configuration I achieved is no_think with native tool calls, or think mode with native tool calls where the response remains exclusively within the think tag (preventing output outside of it).

Has anyone else encountered this issue? I'm using Open Web UI alongside LM Studio.

Elsephire changed discussion title from Personnel repport to Personal report 1 day ago

aquiffoo

aquif AI org 1 day ago

the original model is Qwen3-30B-A3B-Instruct-2507, and we've noticed this issue with tool calling in thinking mode.

thanks for the feedback on the model! I'm glad you liked it and I'm so sorry for this

YOYO-AI

about 16 hours ago

@Elsephire It is a derivative developed based on my merged model: Qwen3-30B-A3B-YOYO-V3. I'm delighted to see it become even better, but it's really disappointing that my contributions weren't mentioned at all.

Elsephire

about 11 hours ago

Hmm, I had a strong doubt; this behavior reminds me of your model. I assume that the 42B is derived from a fine-tune of one of @DavidAU @'s models.

@aquiffoo , to avoid any misunderstandings in a world of open-source models, it would be preferable to cite the contributors. Its Win/Win 😀

aquiffoo

aquif AI org about 5 hours ago

I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model

I'll make sure to credit David and YOYO though. I used Claude to write the readme and I forgot to give it this extra context. I'm so sorry for both of them to not being credited, and I'm thankful for the feedback and support coming from everyone 🙏

FlameF0X

about 4 hours ago

•

edited about 4 hours ago

I used the same technique as DavidAU to create a model with 67 layers (i like the number) and 42B params (the answer to live, the universe and everything), but I often disregard his models because they are creative writing focused. instead of making an RP model, I made a coding/STEM model

6 7

pls kill me /j

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment