Text Generation
Transformers
GGUF
minimax
unsloth
imatrix
conversational

first <think> token not getting outputted

#1
by mtcl - opened

can you help me understand why my first is not getting outputted?

below is my run command

CUDA_VISIBLE_DEVICES="2" ./build/bin/llama-server \
    --model /media/mukul/t7/models/unsloth/MiniMax-M2-GGUF/UD-Q4_K_XL/MiniMax-M2-UD-Q4_K_XL-00001-of-00003.gguf \
    --alias unsloth/MiniMax-M2 \
    --ctx-size 98304 \
    -fa on \
    -b 4096 -ub 4096 \
    -ot ".ffn_.*_exps.=CPU" \
    --n-gpu-layers 99 \
    --jinja \
    --parallel 1 \
    --threads 56 \
    --host 0.0.0.0 \
    --port 10002
Unsloth AI org

Add --special and it should be outputted!

Add --special and it should be outputted!

That did not work for me. I added --special, am I doing it wrong?

CUDA_VISIBLE_DEVICES="2" ./build/bin/llama-server \
    --model /media/mukul/t7/models/unsloth/MiniMax-M2-GGUF/UD-Q4_K_XL/MiniMax-M2-UD-Q4_K_XL-00001-of-00003.gguf \
    --alias unsloth/MiniMax-M2 \
    --ctx-size 98304 \
    -fa on \
    -b 4096 -ub 4096 \
    -ot ".ffn_.*_exps.=CPU" \
    --n-gpu-layers 99 \
    --jinja \
    --special \
    --parallel 1 \
    --threads 56 \
    --host 0.0.0.0 \
    --port 10002

There was something wrong with zhe default chat template, you should edited it
from

屏幕截图 2025-11-03 140131

to

屏幕截图 2025-11-03 140215

{%- endif -%}

i do not know how to do that? where is that file that i need to edit? Is it in the checked out git cloned repo?

1.Download the default chat template https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja
2.Fix it
3.Run model with the cmd :
./llama-server --model ../MiniMax-M2-UD-TQ1_0.gguf --alias "minimax" --threads -1 --n-gpu-layers 999 --prio 3 --temp 1.0 --top-p 0.95 --top-k 40 --ctx-size 60000 --port 8001 --host 0.0.0.0 --flash-attn on --cache-type-k q4_0 --cache-type-v q4_0 -b 4000 -ub 1024 --chat-template-file ./chat_template.jinja --jinja

Thank you! That indeed fixed it!

I appreciate all the help!

mtcl changed discussion status to closed

There was something wrong with zhe default chat template, you should edited it
from

屏幕截图 2025-11-03 140131

to

屏幕截图 2025-11-03 140215

{%- endif -%}

I don't think that's "wrong". Qwen thinking models also have <think> in chat template.

Unsloth AI org

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

It works with the fix above.

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

It works with the fix above.

IMO, <think> in template is meant to ensure the model will output thinking content. Without it, the model probably still generates <think> at the beginning but it's not guaranteed.

But if you keep the <think> in the template, this messes up most things that expect the <think> to be output at the start of thinking and it starts outputting the thinking response as the actual response.

Sign up or log in to comment