-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][Ascend] Add Ascend NPU backend #8054
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
Is there any document on how to use it? |
This work is not ready, if you want to develop this together, follow this,
|
very thankful, I'll try it. |
@wyzanski There is a fatal error about git, i think you may need to recheck your git config. |
期待对国产化的支持! |
6f89d38
to
6ae737e
Compare
感谢对国产化的支持! |
TODO:
|
感谢对国产化的支持!期待在昇腾系列上的效果,太缺一个高效的推理引擎了 |
是否支持在线推理呢 |
Does it means starting an OpenAI-compatible API server? The latest code already supports, like this: # start server
vllm serve facebook/opt-125m
# request
curl http://localhost:8000/v1/completions -H "Content-Type
"model": "facebook/opt-125m",
"prompt": "San Francisco is a",
"max_tokens": 20,
"temperature": 0
}'
# output
{"id":"cmpl-862bb9206aa84004a55c625b75e6dfea","object":"text_completion","created":1726649591,"model":"facebook/opt-125m","choices":[{"index":0,"text":" great place to live. I've lived in San Francisco for a few years now and I've","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":25,"completion_tokens":20}} |
What Ascend NPU devices are currently supported? |
suooprted qwen series LLM? |
Hi @XYZliang, 910A is not supported now, we will work on supports for more type of devices. |
@WangxuP we do not check the model corretness now, here is a simple offline result:
|
should we install mindie first? |
Is there a Dockerfile for npu to build image ? |
Thanks for your help, I will test further today. Is it convenient to tell you the type of NPU chip or device name you are using? |
Ascend SingleOps and Ascend MindIE are independent backends, you can use Ascend SingleOps now without installing MindIE. BTW, MindIE currently is not ready. |
The Dockerfile is ready now at https://github.com/vllm-project/vllm/pull/8054/files#diff-67922969885e8d987974f014c4c6e25fc2ae46b75760bcc6c93b9cc541268781 Using Dockerfile.npu
|
We are using Atlas 300T A2 training card |
According to the official documentation, this operator has more restrictions on 310p. The current PR is developed based on Atlas 300T A2 training card. If you are interested in supporting 310p, welcome to join the development of this PR. |
好的,感谢 |
可以支持多卡推理吗? |
It seems something wrong with malloc memory, can you offer your env info, including NPU type, the version of CANN and driver? Also, it would be best if you could provide your reproduction method. |
我将torch降级到2.4.0后,解决了上面的问题。但是现在出现了新的问题,RuntimeError: aclnnPromptFlashAttentionV3 or aclnnPromptFlashAttentionV3GetWorkspaceSize not in libopapi.so, or libopapi.sonot found. 这需要怎么解决呢?
|
aclnn 算子不存在,应该是你的 CANN 版本太旧了 |
620514f
to
198b85b
Compare
qwen2.5 72b 开启--enable-prefix-caching之后会报错。关闭--enable-prefix-caching之后,使用贪心采样,temperature=0,top=1,相同prompt多次推理结果不一致,有时首token也不一致。 Traceback (most recent call last):
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] The above exception was the direct cause of the following exception:
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method start_worker_execution_loop.
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 85, in start_worker_execution_loop
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner.py", line 1654, in execute_model
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.execute_model(execute_model_req=None)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 85, in start_worker_execution_loop
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.model_runner.execute_model(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.execute_model(execute_model_req=None)
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 456, in forward
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.model_runner.execute_model(
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] raise type(err)(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782677) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/compilation/decorators.py", line 143, in __call__
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.forward(*args, **kwargs)
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 306, in forward
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] raise type(err)(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states, residual = layer(
(VllmWorkerProcess pid=2782674) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 226, in forward
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.self_attn(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 169, in forward
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/layer.py", line 99, in forward
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.impl.forward(query,
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/backends/ascend.py", line 467, in forward
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] query = query.view(-1, attn_metadata.max_prefill_seq_len,
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] The above exception was the direct cause of the following exception:
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 85, in start_worker_execution_loop
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.execute_model(execute_model_req=None)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.model_runner.execute_model(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] raise type(err)(
(VllmWorkerProcess pid=2782673) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method start_worker_execution_loop.
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner.py", line 1654, in execute_model
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 456, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/compilation/decorators.py", line 143, in __call__
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.forward(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 306, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states, residual = layer(
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 226, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.self_attn(
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 169, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/layer.py", line 99, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.impl.forward(query,
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/backends/ascend.py", line 467, in forward
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] query = query.view(-1, attn_metadata.max_prefill_seq_len,
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] The above exception was the direct cause of the following exception:
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 85, in start_worker_execution_loop
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.execute_model(execute_model_req=None)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.model_runner.execute_model(
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] raise type(err)(
(VllmWorkerProcess pid=2782672) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method start_worker_execution_loop.
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-29 18:13:51 engine.py:135] RuntimeError("Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832")
ERROR 11-29 18:13:51 engine.py:135] Traceback (most recent call last):
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-29 18:13:51 engine.py:135] return func(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/worker/model_runner.py", line 1654, in execute_model
ERROR 11-29 18:13:51 engine.py:135] hidden_or_intermediate_states = model_executable(
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 11-29 18:13:51 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 11-29 18:13:51 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 456, in forward
ERROR 11-29 18:13:51 engine.py:135] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/compilation/decorators.py", line 143, in __call__
ERROR 11-29 18:13:51 engine.py:135] return self.forward(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 306, in forward
ERROR 11-29 18:13:51 engine.py:135] hidden_states, residual = layer(
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 11-29 18:13:51 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 11-29 18:13:51 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 226, in forward
ERROR 11-29 18:13:51 engine.py:135] hidden_states = self.self_attn(
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 11-29 18:13:51 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 11-29 18:13:51 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 169, in forward
ERROR 11-29 18:13:51 engine.py:135] attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 11-29 18:13:51 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 11-29 18:13:51 engine.py:135] return forward_call(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/attention/layer.py", line 99, in forward
ERROR 11-29 18:13:51 engine.py:135] return self.impl.forward(query,
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/attention/backends/ascend.py", line 467, in forward
ERROR 11-29 18:13:51 engine.py:135] query = query.view(-1, attn_metadata.max_prefill_seq_len,
ERROR 11-29 18:13:51 engine.py:135] RuntimeError: shape '[-1, 49, 8, 128]' is invalid for input of size 504832
ERROR 11-29 18:13:51 engine.py:135]
ERROR 11-29 18:13:51 engine.py:135] The above exception was the direct cause of the following exception:
ERROR 11-29 18:13:51 engine.py:135]
ERROR 11-29 18:13:51 engine.py:135] Traceback (most recent call last):
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/engine/multiprocessing/engine.py", line 133, in start
ERROR 11-29 18:13:51 engine.py:135] self.run_engine_loop()
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop
ERROR 11-29 18:13:51 engine.py:135] request_outputs = self.engine_step()
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/engine/multiprocessing/engine.py", line 214, in engine_step
ERROR 11-29 18:13:51 engine.py:135] raise e
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/engine/multiprocessing/engine.py", line 205, in engine_step
ERROR 11-29 18:13:51 engine.py:135] return self.engine.step()
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/engine/llm_engine.py", line 1466, in step
ERROR 11-29 18:13:51 engine.py:135] outputs = self.model_executor.execute_model(
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/executor/distributed_gpu_executor.py", line 82, in execute_model
ERROR 11-29 18:13:51 engine.py:135] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/executor/multiproc_gpu_executor.py", line 158, in _driver_execute_model
ERROR 11-29 18:13:51 engine.py:135] return self.driver_worker.execute_model(execute_model_req)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
ERROR 11-29 18:13:51 engine.py:135] output = self.model_runner.execute_model(
ERROR 11-29 18:13:51 engine.py:135] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-29 18:13:51 engine.py:135] return func(*args, **kwargs)
ERROR 11-29 18:13:51 engine.py:135] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
ERROR 11-29 18:13:51 engine.py:135] raise type(err)(
ERROR 11-29 18:13:51 engine.py:135] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner.py", line 1654, in execute_model
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 456, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/compilation/decorators.py", line 143, in __call__
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.forward(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 306, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states, residual = layer(
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 226, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] hidden_states = self.self_attn(
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/model_executor/models/qwen2.py", line 169, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/layer.py", line 99, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return self.impl.forward(query,
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/attention/backends/ascend.py", line 467, in forward
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] query = query.view(-1, attn_metadata.max_prefill_seq_len,
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: shape '[-1, 49, 8, 128]' is invalid for input of size 504832
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] The above exception was the direct cause of the following exception:
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229]
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 85, in start_worker_execution_loop
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.execute_model(execute_model_req=None)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/worker_base.py", line 343, in execute_model
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] output = self.model_runner.execute_model(
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] return func(*args, **kwargs)
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] File "/root/workspace/vllm/vllm/worker/model_runner_base.py", line 152, in _wrapper
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] raise type(err)(
(VllmWorkerProcess pid=2782675) ERROR 11-29 18:13:51 multiproc_worker_utils.py:229] RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241129-181351.pkl): shape '[-1, 49, 8, 128]' is invalid for input of size 504832
|
This pull request has merge conflicts that must be resolved before it can be |
3eda947
to
06f1b1d
Compare
The v1/chat/completions interface always returns a link to an image in the header.Have others encountered the same issue? |
06f1b1d
to
2b92b5c
Compare
使用Atlas 300T Pro 训练卡(型号:9000)报如下错误 INFO 12-23 06:53:00 model_runner.py:1035] Loading model weights took 0.9277 GB
INFO 12-23 06:53:00 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241223-065300.pkl...
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
INFO 12-23 06:53:06 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241223-065300.pkl.
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1608, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/model_executor/models/qwen2.py", line 369, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/model_executor/models/qwen2.py", line 285, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/model_executor/models/qwen2.py", line 210, in forward
[rank0]: hidden_states = self.self_attn(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/model_executor/models/qwen2.py", line 157, in forward
[rank0]: attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/attention/layer.py", line 98, in forward
[rank0]: return self.impl.forward(query,
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/attention/backends/ascend.py", line 473, in forward
[rank0]: output = torch_npu.npu_prompt_flash_attention(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_ops.py", line 1061, in __call__
[rank0]: return self_._op(*args, **(kwargs or {}))
[rank0]: RuntimeError: call aclnnPromptFlashAttentionV3 failed, detail:EZ1001: [PID: 10600] 2024-12-23-06:53:00.641.293 PromptFlashAttention LaunchAicore failed.
[rank0]: TraceBack (most recent call last):
[rank0]: Parse dynamic kernel config fail.
[rank0]: AclOpKernelInit failed opType
[rank0]: PromptFlashAttention LaunchAicore failed.
[rank0]: [ERROR] 2024-12-23-06:53:00 (PID:10600, Device:0, RankID:-1) ERR01100 OPS call acl api failed
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/vllm/examples/offline_inference.py", line 14, in <module>
[rank0]: llm = LLM(model="/mnt/models/Qwen2.5-0.5B-Instruct")
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 214, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 585, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 349, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 484, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 114, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/worker/npu_worker.py", line 148, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/worker/npu_model_runner.py", line 271, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/python3.9/lib/python3.9/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]: raise type(err)(
[rank0]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241223-065300.pkl): call aclnnPromptFlashAttentionV3 failed, detail:EZ1001: [PID: 10600] 2024-12-23-06:53:00.641.293 PromptFlashAttention LaunchAicore failed.
[rank0]: TraceBack (most recent call last):
[rank0]: Parse dynamic kernel config fail.
[rank0]: AclOpKernelInit failed opType
[rank0]: PromptFlashAttention LaunchAicore failed.
[rank0]: [ERROR] 2024-12-23-06:53:00 (PID:10600, Device:0, RankID:-1) ERR01100 OPS call acl api failed 我的环境信息如下: 系统:Ubuntu 22.04.5 LTS
Driver version: 24.1.rc2
CANN version: 8.0.RC3
# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2 Version: 24.1.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 1 910B | OK | 63.1 34 0 / 0 |
| 0 | 0000:81:00.0 | 0 2384 / 15038 1 / 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+ |
@tghfly Your device doesn't support op |
This pull request has merge conflicts that must be resolved before it can be |
请问310p,现在可以正常推理了嘛,我这现在还是会出现乱码 |
@gournd 310P is not supported now |
请问支持量化模型加载吗? |
Co-authored-by: wangshuai09 <[email protected]> Signed-off-by: MengqingCao <[email protected]>
27244b2
to
e25c764
Compare
Not supported now :-( |
Signed-off-by: MengqingCao <[email protected]>
As mentioned in #7692, this PR make Ascend NPU backend available in VLLM.
RoadMap:
Support Device
Install
VLLM_TARGET_DEVICE=npu pip install -e .
to install vllmpython examples/offline_inference_npu.py
Using Dockerfile.npu
modify
--device /dev/davinci0
according to your device.Collaborators
@MengqingCao @dgy516 @hi-liuyifeng @Lin-Qingyang-Alec @liujie92 @JiasenTian @weiwei567 @JuntongMa @xiangjie
@zhangxy1234 @ldh2020 @Eviannn @agoodnoob @rumoralot
This work is still in WIP stage.