We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
...
vLLM does not work when you just pass image URLs
see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/vllm/vllm.py#L166
if you change download=False, it does not work.
How to test?
# run VLLM first docker run --rm -it -e HUGGING_FACE_HUB_TOKEN=... \ -v /home/ashwin/.cache/huggingface:/root/.cache/huggingface \ vllm/vllm-openai:latest \ --trust-remote-code \ --gpu-memory-utilization 0.75 \ --model meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager \ --max-model-len 4096 --max-num-seqs 16 --port 6001
pytest -v -s -k vllm tests/inference/test_vision_inference.py \ --env VLLM_URL=http://localhost:6001/v1
On the vLLM logs, you see
INFO: 127.0.0.1:59652 - "POST /v1/chat/completions HTTP/1.1" 200 OK ERROR 12-16 23:56:17 serving_chat.py:162] Error in loading multi-modal data ERROR 12-16 23:56:17 serving_chat.py:162] Traceback (most recent call last): ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 663, in _request ERROR 12-16 23:56:17 serving_chat.py:162] conn = await self._connector.connect( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 563, in connect ERROR 12-16 23:56:17 serving_chat.py:162] proto = await self._create_connection(req, traces, timeout) ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1032, in _create_connection ERROR 12-16 23:56:17 serving_chat.py:162] _, proto = await self._create_direct_connection(req, traces, timeout) ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1335, in _create_direct_connection ERROR 12-16 23:56:17 serving_chat.py:162] transp, proto = await self._wrap_create_connection( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1091, in _wrap_create_connection ERROR 12-16 23:56:17 serving_chat.py:162] sock = await aiohappyeyeballs.start_connection( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/impl.py", line 89, in start_connection ERROR 12-16 23:56:17 serving_chat.py:162] sock, _, _ = await _staggered.staggered_race( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race ERROR 12-16 23:56:17 serving_chat.py:162] done = await _wait_one( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one ERROR 12-16 23:56:17 serving_chat.py:162] return await wait_next ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] asyncio.exceptions.CancelledError ERROR 12-16 23:56:17 serving_chat.py:162] ERROR 12-16 23:56:17 serving_chat.py:162] The above exception was the direct cause of the following exception: ERROR 12-16 23:56:17 serving_chat.py:162] ERROR 12-16 23:56:17 serving_chat.py:162] Traceback (most recent call last): ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 160, in create_chat_completion ERROR 12-16 23:56:17 serving_chat.py:162] mm_data = await mm_data_future ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 235, in all_mm_data ERROR 12-16 23:56:17 serving_chat.py:162] items = await asyncio.gather(*self._items) ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/utils.py", line 140, in async_get_and_parse_image ERROR 12-16 23:56:17 serving_chat.py:162] image = await async_fetch_image(image_url) ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/utils.py", line 62, in async_fetch_image ERROR 12-16 23:56:17 serving_chat.py:162] image_raw = await global_http_connection.async_get_bytes( ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/connections.py", line 92, in async_get_bytes ERROR 12-16 23:56:17 serving_chat.py:162] async with await self.get_async_response(url, timeout=timeout) as r: ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 1359, in __aenter__ ERROR 12-16 23:56:17 serving_chat.py:162] self._resp: _RetType = await self._coro ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 579, in _request ERROR 12-16 23:56:17 serving_chat.py:162] with timer: ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^ ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/helpers.py", line 749, in __exit__ ERROR 12-16 23:56:17 serving_chat.py:162] raise asyncio.TimeoutError from exc_val
Should have worked as per the documentation of vLLM? I also tried setting VLLM_IMAGE_FETCH_TIMEOUT=20 when starting the vLLM server.
VLLM_IMAGE_FETCH_TIMEOUT=20
The text was updated successfully, but these errors were encountered:
cc @wukaixingxp
Sorry, something went wrong.
No branches or pull requests
System Info
...
Information
🐛 Describe the bug
vLLM does not work when you just pass image URLs
see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/vllm/vllm.py#L166
if you change download=False, it does not work.
How to test?
# run VLLM first docker run --rm -it -e HUGGING_FACE_HUB_TOKEN=... \ -v /home/ashwin/.cache/huggingface:/root/.cache/huggingface \ vllm/vllm-openai:latest \ --trust-remote-code \ --gpu-memory-utilization 0.75 \ --model meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager \ --max-model-len 4096 --max-num-seqs 16 --port 6001
Error logs
On the vLLM logs, you see
Expected behavior
Should have worked as per the documentation of vLLM? I also tried setting
VLLM_IMAGE_FETCH_TIMEOUT=20
when starting the vLLM server.The text was updated successfully, but these errors were encountered: