first <think> token not getting outputted

by mtcl - opened 10 days ago

mtcl

10 days ago

can you help me understand why my first is not getting outputted?

below is my run command

CUDA_VISIBLE_DEVICES="2" ./build/bin/llama-server \
    --model /media/mukul/t7/models/unsloth/MiniMax-M2-GGUF/UD-Q4_K_XL/MiniMax-M2-UD-Q4_K_XL-00001-of-00003.gguf \
    --alias unsloth/MiniMax-M2 \
    --ctx-size 98304 \
    -fa on \
    -b 4096 -ub 4096 \
    -ot ".ffn_.*_exps.=CPU" \
    --n-gpu-layers 99 \
    --jinja \
    --parallel 1 \
    --threads 56 \
    --host 0.0.0.0 \
    --port 10002

danielhanchen

Unsloth AI org 10 days ago

Add --special and it should be outputted!

mtcl

10 days ago

Add --special and it should be outputted!

That did not work for me. I added --special, am I doing it wrong?

CUDA_VISIBLE_DEVICES="2" ./build/bin/llama-server \
    --model /media/mukul/t7/models/unsloth/MiniMax-M2-GGUF/UD-Q4_K_XL/MiniMax-M2-UD-Q4_K_XL-00001-of-00003.gguf \
    --alias unsloth/MiniMax-M2 \
    --ctx-size 98304 \
    -fa on \
    -b 4096 -ub 4096 \
    -ot ".ffn_.*_exps.=CPU" \
    --n-gpu-layers 99 \
    --jinja \
    --special \
    --parallel 1 \
    --threads 56 \
    --host 0.0.0.0 \
    --port 10002

bighhh

9 days ago

•

edited 9 days ago

There was something wrong with zhe default chat template, you should edited it
from

{%- endif -%}

mtcl

9 days ago

i do not know how to do that? where is that file that i need to edit? Is it in the checked out git cloned repo?

bighhh

9 days ago

1.Download the default chat template https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja
2.Fix it
3.Run model with the cmd :
./llama-server --model ../MiniMax-M2-UD-TQ1_0.gguf --alias "minimax" --threads -1 --n-gpu-layers 999 --prio 3 --temp 1.0 --top-p 0.95 --top-k 40 --ctx-size 60000 --port 8001 --host 0.0.0.0 --flash-attn on --cache-type-k q4_0 --cache-type-v q4_0 -b 4000 -ub 1024 --chat-template-file ./chat_template.jinja --jinja

mtcl

9 days ago

Thank you! That indeed fixed it!

I appreciate all the help!

mtcl changed discussion status to closed 9 days ago

CHNtentes

9 days ago

There was something wrong with zhe default chat template, you should edited it
from

to

{%- endif -%}

I don't think that's "wrong". Qwen thinking models also have <think> in chat template.

danielhanchen

Unsloth AI org 8 days ago

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

mtcl

8 days ago

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

It works with the fix above.

CHNtentes

8 days ago

Oh yes so after doing investigation it seems the minimax chat template has the think token be default so you will not be seeing this during the output

@bighhh @mtcl @CHNtentes

It works with the fix above.

IMO, <think> in template is meant to ensure the model will output thinking content. Without it, the model probably still generates <think> at the beginning but it's not guaranteed.

Bikkies

8 days ago

•

edited 8 days ago

But if you keep the <think> in the template, this messes up most things that expect the <think> to be output at the start of thinking and it starts outputting the thinking response as the actual response.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment