Translation
Safetensors
mistral

Output truncated without reason

#12
by shaosh - opened
This comment has been hidden (marked as Resolved)

system info

key value
transformers version 4.51.3
PyTorch version 2.6.0
vllm version 0.8.0
GPU 4090*1

Reproduction

I tried running this model, but the output was always staged and displayed as a length stage, but there was no problem with the configuration.
image.png

config

python3 -m vllm.entrypoints.openai.api_server --max_model_len 4096 --served-model-name seed-x --model /data/models/Seed-X-PPO-7B

I noticed the API response returned a finish_reason of length. Could you please share the code snippet for the request? I’d like to check the configuration, especially for parameters like max_tokens

This comment has been hidden (marked as Resolved)
This comment has been hidden (marked as Resolved)

system info

key value
transformers version 4.51.3
PyTorch version 2.6.0
vllm version 0.8.0
GPU 4090*1

Reproduction

I tried running this model, but the output was always staged and displayed as a length stage, but there was no problem with the configuration.
image.png

config

python3 -m vllm.entrypoints.openai.api_server --max_model_len 4096 --served-model-name seed-x --model /data/models/Seed-X-PPO-7B

I noticed the API response returned a finish_reason of length. Could you please share the code snippet for the request? I’d like to check the configuration, especially for parameters like max_tokens

Thank

You can include max_token in your request,such as
{
"model": "xx",
"prompt": "",
"max_tokens": 512
}

Sign up or log in to comment