Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI crashes while loading Qwen2-VL-7B-Instruct #2728

Open
2 of 4 tasks
ktobah opened this issue Nov 6, 2024 · 1 comment
Open
2 of 4 tasks

TGI crashes while loading Qwen2-VL-7B-Instruct #2728

ktobah opened this issue Nov 6, 2024 · 1 comment

Comments

@ktobah
Copy link

ktobah commented Nov 6, 2024

System Info

2024-11-06T04:38:58.950145Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: b1f9044d6cf082423a517cf9a6aa6e5ebd34e1c2
Docker label: sha-b1f9044
nvidia-smi:
Wed Nov  6 04:38:58 2024       
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 560.35.03              Driver Version: 561.09         CUDA Version: 12.6     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 4090        On  |   00000000:01:00.0  On |                  Off |
   |  0%   36C    P5             33W /  450W |     675MiB /  24564MiB |     32%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+
                                                                                            
   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |  No running processes found                                                             |
   +-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
2024-11-06T04:38:58.950634Z  INFO text_generation_launcher: Args {
    model_id: "/data/Qwen/Qwen2-VL-7B-Instruct",
    revision: None,
    validation_workers: 2,
    sharded: Some(
        false,
    ),
    num_shard: None,
    quantize: Some(
        Eetq,
    ),
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 5,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: Some(
        4999,
    ),
    max_input_length: None,
    max_total_tokens: Some(
        5000,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        5050,
    ),
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: Some(
        32,
    ),
    cuda_graphs: None,
    hostname: "36c9ccfbcab9",
    port: 5025,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: true,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
}
2024-11-06T04:39:00.844343Z  INFO text_generation_launcher: Disabling prefix caching because of VLM model
2024-11-06T04:39:00.844371Z  INFO text_generation_launcher: Using attention flashinfer - Prefix caching 0
2024-11-06T04:39:00.844383Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-06T04:39:00.844506Z  INFO download: text_generation_launcher: Starting check and download process for /data/Qwen/Qwen2-VL-7B-Instruct
2024-11-06T04:39:03.993179Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-06T04:39:04.653005Z  INFO download: text_generation_launcher: Successfully downloaded weights for /data/Qwen/Qwen2-VL-7B-Instruct
2024-11-06T04:39:04.653165Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-06T04:39:06.991476Z  INFO text_generation_launcher: Using prefix caching = False
2024-11-06T04:39:06.991506Z  INFO text_generation_launcher: Using Attention = flashinfer
WARNING 11-06 04:39:07 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install Ray with `pip install ray`.
2024-11-06T04:39:14.664235Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:24.673931Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:34.683198Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:44.690843Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:54.700389Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:04.710560Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:14.719337Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:24.729205Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:34.739375Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:44.747625Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:54.757904Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:04.768215Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:14.776315Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:24.786533Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:34.796818Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:44.805857Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:54.815583Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:04.825250Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:14.833324Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:24.843081Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:34.853696Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:44.861886Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:54.871177Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:04.880903Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:14.889243Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:24.899113Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:34.908406Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:44.916331Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:54.926150Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:04.936289Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:14.945281Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:24.956096Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:34.965929Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:44.974155Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:54.984230Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:04.994178Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:15.002185Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:25.011549Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:35.021098Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:45.029212Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:55.039089Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:05.048297Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:15.055991Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:25.065346Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:35.075354Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:45.084651Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:55.094354Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:05.104200Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:15.111799Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:25.121002Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:35.130693Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:45.139452Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:55.148870Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:05.158577Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:15.166294Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:25.175607Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:35.184943Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:45.193299Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:55.203235Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:05.213497Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:15.222554Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:25.231924Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:35.241455Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:45.248904Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:55.258542Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:05.269004Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:15.277644Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:17.876053Z  INFO text_generation_launcher: Using experimental prefill chunking = False
2024-11-06T04:50:18.714419Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-11-06T04:50:18.780985Z  INFO shard-manager: text_generation_launcher: Shard ready in 674.155920438s rank=0
2024-11-06T04:50:18.817087Z  INFO text_generation_launcher: Starting Webserver
2024-11-06T04:50:18.868288Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection
2024-11-06T04:50:18.868373Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound
2024-11-06T04:50:18.868775Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.871893Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 }
2024-11-06T04:50:18.872201Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.872216Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 }
2024-11-06T04:50:18.872248Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
2024-11-06T04:50:18.872437Z DEBUG service_discovery: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.872682Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.872699Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.873337Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.874009Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.874043Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.874603Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [70, 216, 154, 41, 248, 232, 176, 242] }
2024-11-06T04:50:18.874981Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [70, 216, 154, 41, 248, 232, 176, 242] }
2024-11-06T04:50:18.875679Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.875986Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.876014Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.876020Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.876925Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection
2024-11-06T04:50:18.876959Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound
2024-11-06T04:50:18.876974Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.877061Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
2024-11-06T04:50:18.877174Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 }
2024-11-06T04:50:18.877204Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.877214Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 }
2024-11-06T04:50:18.877243Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.877272Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.877305Z DEBUG clear_cache{batch_id=None}:clear_cache{batch_id=None}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.877708Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.877739Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.877746Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.877963Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [73, 124, 42, 86, 67, 10, 4, 118] }
2024-11-06T04:50:18.877997Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [73, 124, 42, 86, 67, 10, 4, 118] }
2024-11-06T04:50:18.878161Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(0) }
2024-11-06T04:50:18.878197Z DEBUG Connection{peer=Client}: h2::proto::connection: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/connection.rs:432: Connection::poll; connection error error=GoAway(b"", NO_ERROR, Library)
2024-11-06T04:50:18.879979Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880006Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.880013Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.880016Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.880154Z DEBUG info:info: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.880252Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880287Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3) }
2024-11-06T04:50:18.880294Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.880780Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880812Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(3) }
2024-11-06T04:50:18.880821Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.880826Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.880921Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-11-06T04:50:18.881864Z DEBUG warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.881984Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.882004Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5) }
2024-11-06T04:50:18.882035Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.914255Z  INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-11-06T04:50:18.979524Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [227, 222, 58, 95, 83, 190, 149, 210] }
2024-11-06T04:50:18.979565Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [227, 222, 58, 95, 83, 190, 149, 210] }
2024-11-06T04:50:20.793022Z DEBUG hyper::client::service: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/client/service.rs:79: connection error: hyper::Error(Io, Custom { kind: BrokenPipe, error: "connection closed because of a broken pipe" })
2024-11-06T04:50:20.793047Z DEBUG hyper::proto::h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/proto/h2/client.rs:326: client response error: stream closed because of a broken pipe
2024-11-06T04:50:20.793097Z ERROR warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error
Error: Backend(Warmup(Generation("transport error")))
2024-11-06T04:50:20.843273Z ERROR text_generation_launcher: Webserver Crashed
2024-11-06T04:50:20.843305Z  INFO text_generation_launcher: Shutting down shards
2024-11-06T04:50:20.883122Z  INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-11-06T04:50:20.883163Z  INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-11-06T04:50:20.983296Z  INFO shard-manager: text_generation_launcher: shard terminated rank=0

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I am using docker compose for my setup, but the main args are:

    image: ghcr.io/huggingface/text-generation-inference
    container_name: llm-server
    command:
      - --model-id /data/Qwen/Qwen2-VL-7B-Instruct
      - --max-batch-prefill-tokens=5050
      - --max-total-tokens=5000
      - --max-input-tokens=4999
      - --validation-workers=2
      - --max-concurrent-requests=5
      - --max-batch-size=32
      - --port=5025
      - --env
      - --sharded=false 

As for the text-generation-inference docker image, I am using the latest from yesterday (11/5/2024).

Expected behavior

Model should load fine

@ktobah
Copy link
Author

ktobah commented Nov 8, 2024

@drbh How are you able to load it?

I believe you added support for Qwen2-VL model. So did you face any similar issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant