Output truncated without reason
system info
key value transformers version 4.51.3 PyTorch version 2.6.0 vllm version 0.8.0 GPU 4090*1 Reproduction
I tried running this model, but the output was always staged and displayed as a length stage, but there was no problem with the configuration.
config
python3 -m vllm.entrypoints.openai.api_server --max_model_len 4096 --served-model-name seed-x --model /data/models/Seed-X-PPO-7B
I noticed the API response returned a finish_reason of length. Could you please share the code snippet for the request? I’d like to check the configuration, especially for parameters like max_tokens
system info
key value transformers version 4.51.3 PyTorch version 2.6.0 vllm version 0.8.0 GPU 4090*1 Reproduction
I tried running this model, but the output was always staged and displayed as a length stage, but there was no problem with the configuration.
config
python3 -m vllm.entrypoints.openai.api_server --max_model_len 4096 --served-model-name seed-x --model /data/models/Seed-X-PPO-7BI noticed the API response returned a finish_reason of length. Could you please share the code snippet for the request? I’d like to check the configuration, especially for parameters like max_tokens
Thank
You can include max_token in your request,such as
{
"model": "xx",
"prompt": "",
"max_tokens": 512
}
