We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2024-11-06T04:38:58.950145Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.1 Commit sha: b1f9044d6cf082423a517cf9a6aa6e5ebd34e1c2 Docker label: sha-b1f9044 nvidia-smi: Wed Nov 6 04:38:58 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 561.09 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off | | 0% 36C P5 33W / 450W | 675MiB / 24564MiB | 32% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ xpu-smi: N/A 2024-11-06T04:38:58.950634Z INFO text_generation_launcher: Args { model_id: "/data/Qwen/Qwen2-VL-7B-Instruct", revision: None, validation_workers: 2, sharded: Some( false, ), num_shard: None, quantize: Some( Eetq, ), speculate: None, dtype: None, kv_cache_dtype: None, trust_remote_code: false, max_concurrent_requests: 5, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: Some( 4999, ), max_input_length: None, max_total_tokens: Some( 5000, ), waiting_served_ratio: 0.3, max_batch_prefill_tokens: Some( 5050, ), max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: Some( 32, ), cuda_graphs: None, hostname: "36c9ccfbcab9", port: 5025, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-generation-inference.router", cors_allow_origin: [], api_key: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: true, max_client_batch_size: 4, lora_adapters: None, usage_stats: On, } 2024-11-06T04:39:00.844343Z INFO text_generation_launcher: Disabling prefix caching because of VLM model 2024-11-06T04:39:00.844371Z INFO text_generation_launcher: Using attention flashinfer - Prefix caching 0 2024-11-06T04:39:00.844383Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32] 2024-11-06T04:39:00.844506Z INFO download: text_generation_launcher: Starting check and download process for /data/Qwen/Qwen2-VL-7B-Instruct 2024-11-06T04:39:03.993179Z INFO text_generation_launcher: Files are already present on the host. Skipping download. 2024-11-06T04:39:04.653005Z INFO download: text_generation_launcher: Successfully downloaded weights for /data/Qwen/Qwen2-VL-7B-Instruct 2024-11-06T04:39:04.653165Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2024-11-06T04:39:06.991476Z INFO text_generation_launcher: Using prefix caching = False 2024-11-06T04:39:06.991506Z INFO text_generation_launcher: Using Attention = flashinfer WARNING 11-06 04:39:07 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install Ray with `pip install ray`. 2024-11-06T04:39:14.664235Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:39:24.673931Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:39:34.683198Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:39:44.690843Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:39:54.700389Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:04.710560Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:14.719337Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:24.729205Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:34.739375Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:44.747625Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:40:54.757904Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:04.768215Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:14.776315Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:24.786533Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:34.796818Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:44.805857Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:41:54.815583Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:04.825250Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:14.833324Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:24.843081Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:34.853696Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:44.861886Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:42:54.871177Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:04.880903Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:14.889243Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:24.899113Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:34.908406Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:44.916331Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:43:54.926150Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:04.936289Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:14.945281Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:24.956096Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:34.965929Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:44.974155Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:44:54.984230Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:04.994178Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:15.002185Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:25.011549Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:35.021098Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:45.029212Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:45:55.039089Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:05.048297Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:15.055991Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:25.065346Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:35.075354Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:45.084651Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:46:55.094354Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:05.104200Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:15.111799Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:25.121002Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:35.130693Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:45.139452Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:47:55.148870Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:05.158577Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:15.166294Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:25.175607Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:35.184943Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:45.193299Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:48:55.203235Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:05.213497Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:15.222554Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:25.231924Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:35.241455Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:45.248904Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:49:55.258542Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:50:05.269004Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:50:15.277644Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-11-06T04:50:17.876053Z INFO text_generation_launcher: Using experimental prefill chunking = False 2024-11-06T04:50:18.714419Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0 2024-11-06T04:50:18.780985Z INFO shard-manager: text_generation_launcher: Shard ready in 674.155920438s rank=0 2024-11-06T04:50:18.817087Z INFO text_generation_launcher: Starting Webserver 2024-11-06T04:50:18.868288Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection 2024-11-06T04:50:18.868373Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound 2024-11-06T04:50:18.868775Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 } 2024-11-06T04:50:18.871893Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 } 2024-11-06T04:50:18.872201Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) } 2024-11-06T04:50:18.872216Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 } 2024-11-06T04:50:18.872248Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 } 2024-11-06T04:50:18.872437Z DEBUG service_discovery: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request 2024-11-06T04:50:18.872682Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) } 2024-11-06T04:50:18.872699Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 } 2024-11-06T04:50:18.873337Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.874009Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) } 2024-11-06T04:50:18.874043Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) } 2024-11-06T04:50:18.874603Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [70, 216, 154, 41, 248, 232, 176, 242] } 2024-11-06T04:50:18.874981Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [70, 216, 154, 41, 248, 232, 176, 242] } 2024-11-06T04:50:18.875679Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.875986Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) } 2024-11-06T04:50:18.876014Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) } 2024-11-06T04:50:18.876020Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 } 2024-11-06T04:50:18.876925Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection 2024-11-06T04:50:18.876959Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound 2024-11-06T04:50:18.876974Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 } 2024-11-06T04:50:18.877061Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 } 2024-11-06T04:50:18.877174Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 } 2024-11-06T04:50:18.877204Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) } 2024-11-06T04:50:18.877214Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 } 2024-11-06T04:50:18.877243Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) } 2024-11-06T04:50:18.877272Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 } 2024-11-06T04:50:18.877305Z DEBUG clear_cache{batch_id=None}:clear_cache{batch_id=None}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request 2024-11-06T04:50:18.877708Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.877739Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) } 2024-11-06T04:50:18.877746Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) } 2024-11-06T04:50:18.877963Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [73, 124, 42, 86, 67, 10, 4, 118] } 2024-11-06T04:50:18.877997Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [73, 124, 42, 86, 67, 10, 4, 118] } 2024-11-06T04:50:18.878161Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(0) } 2024-11-06T04:50:18.878197Z DEBUG Connection{peer=Client}: h2::proto::connection: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/connection.rs:432: Connection::poll; connection error error=GoAway(b"", NO_ERROR, Library) 2024-11-06T04:50:18.879979Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.880006Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) } 2024-11-06T04:50:18.880013Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) } 2024-11-06T04:50:18.880016Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 } 2024-11-06T04:50:18.880154Z DEBUG info:info: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request 2024-11-06T04:50:18.880252Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.880287Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3) } 2024-11-06T04:50:18.880294Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3), flags: (0x1: END_STREAM) } 2024-11-06T04:50:18.880780Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.880812Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(3) } 2024-11-06T04:50:18.880821Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) } 2024-11-06T04:50:18.880826Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 } 2024-11-06T04:50:18.880921Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model 2024-11-06T04:50:18.881864Z DEBUG warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request 2024-11-06T04:50:18.881984Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) } 2024-11-06T04:50:18.882004Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5) } 2024-11-06T04:50:18.882035Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5), flags: (0x1: END_STREAM) } 2024-11-06T04:50:18.914255Z INFO text_generation_launcher: Using optimized Triton indexing kernels. 2024-11-06T04:50:18.979524Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [227, 222, 58, 95, 83, 190, 149, 210] } 2024-11-06T04:50:18.979565Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [227, 222, 58, 95, 83, 190, 149, 210] } 2024-11-06T04:50:20.793022Z DEBUG hyper::client::service: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/client/service.rs:79: connection error: hyper::Error(Io, Custom { kind: BrokenPipe, error: "connection closed because of a broken pipe" }) 2024-11-06T04:50:20.793047Z DEBUG hyper::proto::h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/proto/h2/client.rs:326: client response error: stream closed because of a broken pipe 2024-11-06T04:50:20.793097Z ERROR warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error Error: Backend(Warmup(Generation("transport error"))) 2024-11-06T04:50:20.843273Z ERROR text_generation_launcher: Webserver Crashed 2024-11-06T04:50:20.843305Z INFO text_generation_launcher: Shutting down shards 2024-11-06T04:50:20.883122Z INFO shard-manager: text_generation_launcher: Terminating shard rank=0 2024-11-06T04:50:20.883163Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0 2024-11-06T04:50:20.983296Z INFO shard-manager: text_generation_launcher: shard terminated rank=0
I am using docker compose for my setup, but the main args are:
image: ghcr.io/huggingface/text-generation-inference container_name: llm-server command: - --model-id /data/Qwen/Qwen2-VL-7B-Instruct - --max-batch-prefill-tokens=5050 - --max-total-tokens=5000 - --max-input-tokens=4999 - --validation-workers=2 - --max-concurrent-requests=5 - --max-batch-size=32 - --port=5025 - --env - --sharded=false
As for the text-generation-inference docker image, I am using the latest from yesterday (11/5/2024).
Model should load fine
The text was updated successfully, but these errors were encountered:
@drbh How are you able to load it?
I believe you added support for Qwen2-VL model. So did you face any similar issue?
Sorry, something went wrong.
No branches or pull requests
System Info
Information
Tasks
Reproduction
I am using docker compose for my setup, but the main args are:
As for the text-generation-inference docker image, I am using the latest from yesterday (11/5/2024).
Expected behavior
Model should load fine
The text was updated successfully, but these errors were encountered: