Qwen3-YOYO
Collection
11 items
โข
Updated
โข
3
This is an auto-thinking-switching model built with model merging and expert substitution techniques: it answers simple questions directly, gives brief thoughts to moderate ones, and delves deeply into difficult ones.
merge method: arcee_fusion
Highest precision: dtype: float32 + out_dtype: bfloat16
Context length: 262,144&1010000
Temperature=0.6,TopP=0.95,TopK=20,MinP=0.
Conduct initial mixing of the instruction model and reasoning model.
models:
- model: Qwen/Qwen3-30B-A3B-Thinking-2507
merge_method: arcee_fusion
base_model: Qwen/Qwen3-30B-A3B-Instruct-2507
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: Qwen3-30B-A3B-YOYO-AutoThink-preview
Inspired by this paper , we use the following regular expression: ^model\.layers\.\d+\.mlp\.experts\.\d+\.(down_proj|gate_proj|up_proj)\.weight$ for expert replacement โ all experts in Qwen3-30B-A3B-YOYO-AutoThink-preview that match the regex are replaced with those from Qwen3-30B-A3B-Thinking-2507.