Error loading the model :(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'

#4
by chgrdj - opened

Hey guys tried ot spin up the model, I'm getting this error:
vllm serve cpatonn/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit --tensor-parallel-size 1 --max-model-len 2048 --gpu-memory-utilization 0.95

(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self._init_executor()
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self.collective_rpc("load_model")
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/utils/init.py", line 3060, in run_method
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self.model = model_loader.load_model(
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] self.load_weights(model, model_config)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] loaded_weights = model.load_weights(
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1216, in load_weights
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] return loader.load_weights(weights)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] yield from self._load_module(prefix,
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1044, in load_weights
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] param = params_dict[name]
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=64203) ERROR 09-20 17:34:23 [core.py:718] KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
(EngineCore_DP0 pid=64203) Process EngineCore_DP0:
(EngineCore_DP0 pid=64203) Traceback (most recent call last):
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=64203) self.run()
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=64203) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=64203) raise e
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=64203) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=64203) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=64203) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=64203) self._init_executor()
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_DP0 pid=64203) self.collective_rpc("load_model")
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=64203) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/utils/init.py", line 3060, in run_method
(EngineCore_DP0 pid=64203) return func(*args, **kwargs)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=64203) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
(EngineCore_DP0 pid=64203) self.model = model_loader.load_model(
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=64203) self.load_weights(model, model_config)
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 265, in load_weights
(EngineCore_DP0 pid=64203) loaded_weights = model.load_weights(
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1216, in load_weights
(EngineCore_DP0 pid=64203) return loader.load_weights(weights)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
(EngineCore_DP0 pid=64203) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
(EngineCore_DP0 pid=64203) yield from self._load_module(prefix,
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
(EngineCore_DP0 pid=64203) loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=64203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=64203) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_next.py", line 1044, in load_weights
(EngineCore_DP0 pid=64203) param = params_dict[name]
(EngineCore_DP0 pid=64203) ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=64203) KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
Loading safetensors checkpoint shards: 0% Completed | 0/10 [00:01<?, ?it/s]
(EngineCore_DP0 pid=64203)
[rank0]:[W920 17:34:24.487338229 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=63902) Traceback (most recent call last):
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/bin/vllm", line 7, in
(APIServer pid=63902) sys.exit(main())
(APIServer pid=63902) ^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=63902) args.dispatch_function(args)
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=63902) uvloop.run(run_server(args))
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
(APIServer pid=63902) return __asyncio.run(
(APIServer pid=63902) ^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=63902) return runner.run(main)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=63902) return self._loop.run_until_complete(task)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=63902) return await main
(APIServer pid=63902) ^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1941, in run_server
(APIServer pid=63902) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker
(APIServer pid=63902) async with build_async_engine_client(
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=63902) return await anext(self.gen)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client
(APIServer pid=63902) async with build_async_engine_client_from_engine_args(
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=63902) return await anext(self.gen)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=63902) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/utils/init.py", line 1589, in inner
(APIServer pid=63902) return fn(*args, **kwargs)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 212, in from_vllm_config
(APIServer pid=63902) return cls(
(APIServer pid=63902) ^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 136, in init
(APIServer pid=63902) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=63902) return AsyncMPClient(*client_args)
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=63902) super().init(
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=63902) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=63902) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=63902) next(self.gen)
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines
(APIServer pid=63902) wait_for_engine_startup(
(APIServer pid=63902) File "/home/paperspace/anaconda3/envs/vllm_env/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
(APIServer pid=63902) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=63902) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Seems like im on the wrong version of vllm or transformers but i reisntalled them to be sure and i got :
vllm 0.10.2
transformers 4.57.0.dev0

I dont know from where the error could originate?
Thanks

The error originates from KeyError: 'layers.30.mlp.shared_expert.down_proj.weight' which vllm does not consider shared_expert to not be quantized. Please install the latest vllm as the solution is only merged a few days ago using

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

Edit: My apologies, the merged fix is not in vllm nightly build, please build and install vllm from source.

Thanks for the fast answer!

is it still not in nightly build?

Sign up or log in to comment