Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: example/openai_chat_completion_client_with_tools.py not working #11903

Open
1 task done
Hurricane31337 opened this issue Jan 9, 2025 · 5 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@Hurricane31337
Copy link

Your current environment

The output of `python collect_env.py`
Not relevant, it's a Docker container (running on Ubuntu 24.04.1 LTS)

Model Input Dumps

vLLM:
$ docker container stop vllm; docker container rm vllm; docker run --name vllm --runtime nvidia -e "VLLM_LOGGING_LEVEL=DEBUG" -e "NVIDIA_VISIBLE_DEVICES=GPU-8cba8394-b5d6-1e92-6658-bb6efc08abff,GPU-c05c3905-fdd9-34a3-f6c0-1437beb91c7d" -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=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" --ipc=host -p 8000:8000 vllm/vllm-openai --gpu-memory-utilization 0.95 --model cstr/llama3.1-8b-spaetzle-v90 --served-model-name llama3.1-8b-spaetzle-v90 --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser hermes
vllm
vllm
INFO 01-09 07:32:00 api_server.py:712] vLLM API server version 0.6.6.post1
INFO 01-09 07:32:00 api_server.py:713] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=True, tool_call_parser='hermes', tool_parser_plugin='', model='cstr/llama3.1-8b-spaetzle-v90', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['llama3.1-8b-spaetzle-v90'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
DEBUG 01-09 07:32:00 init.py:60] No plugins found.
DEBUG 01-09 07:32:00 api_server.py:180] Multiprocessing frontend to use ipc:///tmp/29b13bff-e5f0-4028-8a35-f5fce0df81c1 for IPC Path.
INFO 01-09 07:32:00 api_server.py:199] Started engine process with PID 76
DEBUG 01-09 07:32:08 init.py:60] No plugins found.
INFO 01-09 07:32:12 config.py:510] This model supports multiple tasks: {'embed', 'reward', 'generate', 'score', 'classify'}. Defaulting to 'generate'.
INFO 01-09 07:32:12 config.py:1310] Defaulting to use mp for distributed inference
WARNING 01-09 07:32:12 arg_utils.py:1103] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
INFO 01-09 07:32:12 config.py:1458] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 01-09 07:32:18 config.py:510] This model supports multiple tasks: {'generate', 'reward', 'embed', 'classify', 'score'}. Defaulting to 'generate'.
INFO 01-09 07:32:18 config.py:1310] Defaulting to use mp for distributed inference
WARNING 01-09 07:32:18 arg_utils.py:1103] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
INFO 01-09 07:32:18 config.py:1458] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 01-09 07:32:18 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='cstr/llama3.1-8b-spaetzle-v90', speculative_config=None, tokenizer='cstr/llama3.1-8b-spaetzle-v90', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=llama3.1-8b-spaetzle-v90, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
WARNING 01-09 07:32:19 multiproc_worker_utils.py:312] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 01-09 07:32:19 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=348) INFO 01-09 07:32:21 selector.py:120] Using Flash Attention backend.
(VllmWorkerProcess pid=348) INFO 01-09 07:32:21 multiproc_worker_utils.py:222] Worker ready; awaiting tasks
INFO 01-09 07:32:21 selector.py:120] Using Flash Attention backend.
DEBUG 01-09 07:32:22 parallel_state.py:959] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:38131 backend=nccl
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:22 parallel_state.py:959] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:38131 backend=nccl
INFO 01-09 07:32:22 utils.py:918] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=348) INFO 01-09 07:32:22 utils.py:918] Found nccl from library libnccl.so.2
INFO 01-09 07:32:22 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=348) INFO 01-09 07:32:22 pynccl.py:69] vLLM is using nccl==2.21.5
DEBUG 01-09 07:32:23 client.py:186] Waiting for output from MQLLMEngine.
INFO 01-09 07:32:23 custom_all_reduce_utils.py:204] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
DEBUG 01-09 07:32:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:33 client.py:186] Waiting for output from MQLLMEngine.
INFO 01-09 07:32:39 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
(VllmWorkerProcess pid=348) INFO 01-09 07:32:39 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json
DEBUG 01-09 07:32:40 shm_broadcast.py:215] Binding to tcp://127.0.0.1:49403
INFO 01-09 07:32:40 shm_broadcast.py:255] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_bd2e25f8'), local_subscribe_port=49403, remote_subscribe_port=None)
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:40 shm_broadcast.py:279] Connecting to tcp://127.0.0.1:49403
INFO 01-09 07:32:40 model_runner.py:1094] Starting to load model cstr/llama3.1-8b-spaetzle-v90...
(VllmWorkerProcess pid=348) INFO 01-09 07:32:40 model_runner.py:1094] Starting to load model cstr/llama3.1-8b-spaetzle-v90...
DEBUG 01-09 07:32:40 decorators.py:105] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:40 decorators.py:105] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
DEBUG 01-09 07:32:40 config.py:3285] enabled custom ops: Counter({'rms_norm': 65, 'silu_and_mul': 32, 'rotary_embedding': 1})
DEBUG 01-09 07:32:40 config.py:3287] disabled custom ops: Counter()
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:40 config.py:3285] enabled custom ops: Counter({'rms_norm': 65, 'silu_and_mul': 32, 'rotary_embedding': 1})
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:40 config.py:3287] disabled custom ops: Counter()
INFO 01-09 07:32:40 weight_utils.py:251] Using model weights format ['
.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/17 [00:00<?, ?it/s]
(VllmWorkerProcess pid=348) INFO 01-09 07:32:40 weight_utils.py:251] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 6% Completed | 1/17 [00:00<00:05, 2.77it/s]
Loading safetensors checkpoint shards: 12% Completed | 2/17 [00:00<00:05, 2.64it/s]
Loading safetensors checkpoint shards: 18% Completed | 3/17 [00:01<00:05, 2.41it/s]
Loading safetensors checkpoint shards: 24% Completed | 4/17 [00:01<00:05, 2.34it/s]
Loading safetensors checkpoint shards: 29% Completed | 5/17 [00:02<00:05, 2.33it/s]
Loading safetensors checkpoint shards: 35% Completed | 6/17 [00:02<00:04, 2.44it/s]
DEBUG 01-09 07:32:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:43 client.py:186] Waiting for output from MQLLMEngine.
Loading safetensors checkpoint shards: 41% Completed | 7/17 [00:02<00:04, 2.45it/s]
Loading safetensors checkpoint shards: 47% Completed | 8/17 [00:03<00:03, 2.81it/s]
DEBUG 01-09 07:32:44 utils.py:156] Loaded weight lm_head.weight with shape torch.Size([64128, 4096])
Loading safetensors checkpoint shards: 53% Completed | 9/17 [00:03<00:02, 3.18it/s]
(VllmWorkerProcess pid=348) DEBUG 01-09 07:32:44 utils.py:156] Loaded weight lm_head.weight with shape torch.Size([64128, 4096])
Loading safetensors checkpoint shards: 59% Completed | 10/17 [00:03<00:02, 3.04it/s]
Loading safetensors checkpoint shards: 65% Completed | 11/17 [00:04<00:02, 2.88it/s]
Loading safetensors checkpoint shards: 71% Completed | 12/17 [00:04<00:01, 2.81it/s]
Loading safetensors checkpoint shards: 76% Completed | 13/17 [00:04<00:01, 2.59it/s]
Loading safetensors checkpoint shards: 88% Completed | 15/17 [00:05<00:00, 3.22it/s]
Loading safetensors checkpoint shards: 94% Completed | 16/17 [00:05<00:00, 2.94it/s]
Loading safetensors checkpoint shards: 100% Completed | 17/17 [00:06<00:00, 2.79it/s]
Loading safetensors checkpoint shards: 100% Completed | 17/17 [00:06<00:00, 2.74it/s]

INFO 01-09 07:32:47 model_runner.py:1099] Loading model weights took 7.5122 GB
(VllmWorkerProcess pid=348) INFO 01-09 07:32:48 model_runner.py:1099] Loading model weights took 7.5122 GB
DEBUG 01-09 07:32:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:32:53 client.py:186] Waiting for output from MQLLMEngine.
INFO 01-09 07:32:55 worker.py:241] Memory profiling takes 6.84 seconds
INFO 01-09 07:32:55 worker.py:241] the current vLLM instance can use total_gpu_memory (47.54GiB) x gpu_memory_utilization (0.95) = 45.16GiB
INFO 01-09 07:32:55 worker.py:241] model weights take 7.51GiB; non_torch_memory takes 0.68GiB; PyTorch activation peak memory takes 1.19GiB; the rest of the memory reserved for KV Cache is 35.78GiB.
(VllmWorkerProcess pid=348) INFO 01-09 07:32:55 worker.py:241] Memory profiling takes 6.95 seconds
(VllmWorkerProcess pid=348) INFO 01-09 07:32:55 worker.py:241] the current vLLM instance can use total_gpu_memory (47.54GiB) x gpu_memory_utilization (0.95) = 45.16GiB
(VllmWorkerProcess pid=348) INFO 01-09 07:32:55 worker.py:241] model weights take 7.51GiB; non_torch_memory takes 0.64GiB; PyTorch activation peak memory takes 0.24GiB; the rest of the memory reserved for KV Cache is 36.76GiB.
INFO 01-09 07:32:55 distributed_gpu_executor.py:57] # GPU blocks: 36643, # CPU blocks: 4096
INFO 01-09 07:32:55 distributed_gpu_executor.py:61] Maximum concurrency for 131072 tokens per request: 4.47x
INFO 01-09 07:32:59 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 3%|▎ | 1/35 [00:00<00:26, 1.30it/s](VllmWorkerProcess pid=348) INFO 01-09 07:33:00 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 14%|█▍ | 5/35 [00:04<00:25, 1.18it/s]DEBUG 01-09 07:33:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:03 client.py:186] Waiting for output from MQLLMEngine.
Capturing CUDA graph shapes: 49%|████▊ | 17/35 [00:14<00:15, 1.19it/s]DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:13 client.py:186] Waiting for output from MQLLMEngine.
Capturing CUDA graph shapes: 80%|████████ | 28/35 [00:23<00:05, 1.19it/s]DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:23 client.py:186] Waiting for output from MQLLMEngine.
Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:30<00:00, 1.15it/s]
INFO 01-09 07:33:29 custom_all_reduce.py:224] Registering 2275 cuda graph addresses
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:43 client.py:186] Waiting for output from MQLLMEngine.
(VllmWorkerProcess pid=348) INFO 01-09 07:33:45 custom_all_reduce.py:224] Registering 2275 cuda graph addresses
(VllmWorkerProcess pid=348) INFO 01-09 07:33:45 model_runner.py:1535] Graph capturing finished in 45 secs, took 0.98 GiB
INFO 01-09 07:33:45 model_runner.py:1535] Graph capturing finished in 46 secs, took 0.98 GiB
INFO 01-09 07:33:45 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 57.23 seconds
DEBUG 01-09 07:33:45 engine.py:130] Starting Startup Loop.
DEBUG 01-09 07:33:45 engine.py:132] Starting Engine Loop.
DEBUG 01-09 07:33:46 api_server.py:262] vLLM to use /tmp/tmpc23ajfua as PROMETHEUS_MULTIPROC_DIR
INFO 01-09 07:33:46 api_server.py:640] Using supplied chat template:
INFO 01-09 07:33:46 api_server.py:640] None
INFO 01-09 07:33:46 serving_chat.py:73] "auto" tool choice has been enabled please note that while the parallel_tool_calls client option is preset for compatibility reasons, it will be ignored.
INFO 01-09 07:33:46 launcher.py:19] Available routes are:
INFO 01-09 07:33:46 launcher.py:27] Route: /openapi.json, Methods: GET, HEAD
INFO 01-09 07:33:46 launcher.py:27] Route: /docs, Methods: GET, HEAD
INFO 01-09 07:33:46 launcher.py:27] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 01-09 07:33:46 launcher.py:27] Route: /redoc, Methods: GET, HEAD
INFO 01-09 07:33:46 launcher.py:27] Route: /health, Methods: GET
INFO 01-09 07:33:46 launcher.py:27] Route: /tokenize, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /detokenize, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /v1/models, Methods: GET
INFO 01-09 07:33:46 launcher.py:27] Route: /version, Methods: GET
INFO 01-09 07:33:46 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /pooling, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /score, Methods: POST
INFO 01-09 07:33:46 launcher.py:27] Route: /v1/score, Methods: POST
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:53 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:33:55 client.py:165] Heartbeat successful.
DEBUG 01-09 07:33:55 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
DEBUG 01-09 07:33:56 engine.py:190] Waiting for new requests in engine loop.
INFO: 192.168.20.118:57735 - "GET /v1/models HTTP/1.1" 200 OK
INFO 01-09 07:35:30 chat_utils.py:333] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
INFO: 192.168.20.118:57735 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.168.20.118:57735 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.168.20.118:57735 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:33 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 07:35:36 client.py:165] Heartbeat successful.

Python script:
$ python vLLM_OpenAI_Compatible_Tool.py
Traceback (most recent call last):
File "C:\Users\Username\Documents\Coding\AI\vLLM_OpenAI_Compatible_Tool.py", line 63, in
chat_completion = client.chat.completions.create(messages=messages,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_utils_utils.py", line 274, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\resources\chat\completions.py", line 742, in create
return self._post(
^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1277, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 954, in request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1043, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1092, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1043, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1092, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai_base_client.py", line 1058, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500

🐛 Describe the bug

When I run the vLLM Docker container, I can't get the tool calling to work. Not even the official example of this repo (example/openai_chat_completion_client_with_tools.py) works, so I'm sure there must be an issue. The normal chat completions endpoint with streaming and without tools works fine.

Reproduce with the following command:
docker container stop vllm; docker container rm vllm; docker run --name vllm --runtime nvidia -e "NVIDIA_VISIBLE_DEVICES=GPU-XXXXXXXXXXXXXXXXXXXXXXXXXX,GPU-XXXXXXXXXXXXXXXXXXXXXXXXXX" -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxx" --ipc=host -p 8000:8000 vllm/vllm-openai --gpu-memory-utilization 0.95 --model cstr/llama3.1-8b-spaetzle-v90 --tensor-parallel-size 2 --served-model-name llama3.1-8b-spaetzle-v90 --enable-auto-tool-choice --tool-call-parser hermes

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Hurricane31337 Hurricane31337 added the bug Something isn't working label Jan 9, 2025
@DarkLight1337
Copy link
Member

You should use the tool calling chat template in examples/tool_chat_template_llama3.1_json.jinja

@Hurricane31337
Copy link
Author

You're right, thanks! Still the same Internal Server Error 500, though.

`INFO 01-09 08:49:54 api_server.py:640] Using supplied chat template:
INFO 01-09 08:49:54 api_server.py:640] {{- bos_token }}
INFO 01-09 08:49:54 api_server.py:640] {%- if custom_tools is defined %}
INFO 01-09 08:49:54 api_server.py:640] {%- set tools = custom_tools %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- if not tools_in_user_message is defined %}
INFO 01-09 08:49:54 api_server.py:640] {%- set tools_in_user_message = false %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- if not date_string is defined %}
INFO 01-09 08:49:54 api_server.py:640] {%- if strftime_now is defined %}
INFO 01-09 08:49:54 api_server.py:640] {%- set date_string = strftime_now("%d %b %Y") %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- set date_string = "26 Jul 2024" %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- if not tools is defined %}
INFO 01-09 08:49:54 api_server.py:640] {%- set tools = none %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640]
INFO 01-09 08:49:54 api_server.py:640] {#- Find out if there are any images #}
INFO 01-09 08:49:54 api_server.py:640] {% set image_ns = namespace(has_images=false) %}
INFO 01-09 08:49:54 api_server.py:640] {%- for message in messages %}
INFO 01-09 08:49:54 api_server.py:640] {%- for content in message['content'] %}
INFO 01-09 08:49:54 api_server.py:640] {%- if content['type'] == 'image' %}
INFO 01-09 08:49:54 api_server.py:640] {%- set image_ns.has_images = true %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640]
INFO 01-09 08:49:54 api_server.py:640] {#- This block extracts the system message, so we can slot it into the right place. #}
INFO 01-09 08:49:54 api_server.py:640] {%- if messages[0]['role'] == 'system' %}
INFO 01-09 08:49:54 api_server.py:640] {%- if messages[0]['content'] is string %}
INFO 01-09 08:49:54 api_server.py:640] {%- set system_message = messages[0]['content']|trim %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- set system_message = messages[0]['content'][0]['text']|trim %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- set messages = messages[1:] %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- if tools is not none %}
INFO 01-09 08:49:54 api_server.py:640] {%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- set system_message = "" %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640]
INFO 01-09 08:49:54 api_server.py:640] {#- System message if there are no images, if the user supplied one, or if tools are used (default tool system message) #}
INFO 01-09 08:49:54 api_server.py:640] {%- if system_message or not image_ns.has_images %}
INFO 01-09 08:49:54 api_server.py:640] {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- if tools is not none %}
INFO 01-09 08:49:54 api_server.py:640] {{- "Environment: ipython\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {{- "Cutting Knowledge Date: December 2023\n" }}
INFO 01-09 08:49:54 api_server.py:640] {{- "Today Date: " + date_string + "\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- if tools is not none and not tools_in_user_message %}
INFO 01-09 08:49:54 api_server.py:640] {{- "You have access to the following functions. To call a function, please respond with JSON for a function call. " }}
INFO 01-09 08:49:54 api_server.py:640] {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }}
INFO 01-09 08:49:54 api_server.py:640] {{- "Do not use variables.\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- for t in tools %}
INFO 01-09 08:49:54 api_server.py:640] {{- t | tojson(indent=4) }}
INFO 01-09 08:49:54 api_server.py:640] {{- "\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {{- system_message }}
INFO 01-09 08:49:54 api_server.py:640] {{- "<|eot_id|>" }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640]
INFO 01-09 08:49:54 api_server.py:640] {#- Custom tools are passed in a user message with some extra guidance #}
INFO 01-09 08:49:54 api_server.py:640] {%- if tools_in_user_message and not tools is none %}
INFO 01-09 08:49:54 api_server.py:640] {#- Extract the first user message so we can plug it in here #}
INFO 01-09 08:49:54 api_server.py:640] {%- if messages | length != 0 %}
INFO 01-09 08:49:54 api_server.py:640] {%- if messages[0]['content'] is string %}
INFO 01-09 08:49:54 api_server.py:640] {%- set first_user_message = messages[0]['content']|trim %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- set first_user_message = messages[0]['content'] | selectattr('type', 'equalto', 'text') | map(attribute='text') | map('trim') | join('\n') %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- set messages = messages[1:] %}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
INFO 01-09 08:49:54 api_server.py:640] {{- "Given the following functions, please respond with a JSON for a function call " }}
INFO 01-09 08:49:54 api_server.py:640] {{- "with its proper arguments that best answers the given prompt.\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }}
INFO 01-09 08:49:54 api_server.py:640] {{- "Do not use variables.\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- for t in tools %}
INFO 01-09 08:49:54 api_server.py:640] {{- t | tojson(indent=4) }}
INFO 01-09 08:49:54 api_server.py:640] {{- "\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {{- first_user_message + "<|eot_id|>"}}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640]
INFO 01-09 08:49:54 api_server.py:640] {%- for message in messages %}
INFO 01-09 08:49:54 api_server.py:640] {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }}
INFO 01-09 08:49:54 api_server.py:640] {%- if message['content'] is string %}
INFO 01-09 08:49:54 api_server.py:640] {{- message['content'] | trim}}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- for content in message['content'] %}
INFO 01-09 08:49:54 api_server.py:640] {%- if content['type'] == 'image' %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|image|>' }}
INFO 01-09 08:49:54 api_server.py:640] {%- elif content['type'] == 'text' %}
INFO 01-09 08:49:54 api_server.py:640] {{- content['text'] | trim }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|eot_id|>' }}
INFO 01-09 08:49:54 api_server.py:640] {%- elif 'tool_calls' in message %}
INFO 01-09 08:49:54 api_server.py:640] {%- if not message.tool_calls|length == 1 %}
INFO 01-09 08:49:54 api_server.py:640] {{- raise_exception("This model only supports single tool-calls at once!") }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- set tool_call = message.tool_calls[0].function %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
INFO 01-09 08:49:54 api_server.py:640] {{- '{"name": "' + tool_call.name + '", ' }}
INFO 01-09 08:49:54 api_server.py:640] {{- '"parameters": ' }}
INFO 01-09 08:49:54 api_server.py:640] {{- tool_call.arguments | tojson }}
INFO 01-09 08:49:54 api_server.py:640] {{- "}" }}
INFO 01-09 08:49:54 api_server.py:640] {{- "<|eot_id|>" }}
INFO 01-09 08:49:54 api_server.py:640] {%- elif message.role == "tool" or message.role == "ipython" %}
INFO 01-09 08:49:54 api_server.py:640] {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
INFO 01-09 08:49:54 api_server.py:640] {%- if message.content is string %}
INFO 01-09 08:49:54 api_server.py:640] {{- { "output": message.content } | tojson }}
INFO 01-09 08:49:54 api_server.py:640] {%- else %}
INFO 01-09 08:49:54 api_server.py:640] {%- for content in message['content'] %}
INFO 01-09 08:49:54 api_server.py:640] {%- if content['type'] == 'text' %}
INFO 01-09 08:49:54 api_server.py:640] {{- { "output": content['text'] } | tojson }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {{- "<|eot_id|>" }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640] {%- endfor %}
INFO 01-09 08:49:54 api_server.py:640] {%- if add_generation_prompt %}
INFO 01-09 08:49:54 api_server.py:640] {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
INFO 01-09 08:49:54 api_server.py:640] {%- endif %}
INFO 01-09 08:49:54 api_server.py:640]

...

DEBUG 01-09 08:50:54 client.py:165] Heartbeat successful.
DEBUG 01-09 08:50:54 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
DEBUG 01-09 08:50:54 client.py:165] Heartbeat successful.
DEBUG 01-09 08:50:54 engine.py:190] Waiting for new requests in engine loop.
INFO: 192.168.20.118:61598 - "GET /v1/models HTTP/1.1" 200 OK
INFO 01-09 08:51:01 chat_utils.py:333] Detected the chat template content format to be 'openai'. You can set --chat-template-content-format to override this.
INFO: 192.168.20.118:61598 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.168.20.118:61598 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.168.20.118:61598 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.
DEBUG 01-09 08:51:03 client.py:186] Waiting for output from MQLLMEngine.

`

@DarkLight1337
Copy link
Member

Do you get any error logs when the internal server error occurs? If not, try passing --disable-frontend-multiprocessing to get more detailed logs.

@Hurricane31337
Copy link
Author

Sadly not:

$ docker container stop vllm; docker container rm vllm; docker run --name vllm --runtime nvidia -e "VLLM_LOGGING_LEVEL=DEBUG" -e "NVIDIA_VISIBLE_DEVICES=GPU-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX,GPU-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -v ~/.cache/huggingface:/root/.cache/huggingface -v ~/AI/vLLM:/root/vLLM --env "HUGGING_FACE_HUB_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx" --ipc=host -p 8000:8000 vllm/vllm-openai --gpu-memory-utilization 0.95 --model cstr/llama3.1-8b-spaetzle-v90 --served-model-name llama3.1-8b-spaetzle-v90 --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser hermes --chat-template /root/vLLM/tool_chat_template_llama3.2_json.jinja --disable-frontend-multiprocessing --uvicorn-log-level debug

DEBUG 01-10 00:24:24 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO: 10.242.3.24:50684 - "GET /v1/models HTTP/1.1" 200 OK
INFO 01-10 00:24:29 chat_utils.py:333] Detected the chat template content format to be 'openai'. You can set --chat-template-content-format to override this.
INFO: 10.242.3.24:50684 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 10.242.3.24:50684 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 10.242.3.24:50684 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
DEBUG 01-10 00:24:34 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

I don't know if the uvicorn logging might tell something useful, I don't know how to get that log output.

Do you have a server with a GPU running Docker to reproduce this? If you have a HuggingFace account (and token), you should be able to reproduce it with the docker command mentioned above and the official example script.

@DarkLight1337
Copy link
Member

cc @K-Mistele @heheda12345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants