Can't build whisper engines with past two releases #2508

MahmoudAshraf97 · 2024-11-27T22:10:27Z

System Info

CPU architecture: x86_64
CPU/Host memory size: 512GB
GPU properties
- GPU name: 2xNVIDIA H100
Libraries
- TensorRT-LLM branch or tag: v0.16.0dev2024111900 or higher
- Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used: the ones in the requirements

Who can help?

@byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

trtllm-build  --checkpoint_dir trt/whisper_large-v2_weights_int8/encoder \
              --output_dir trt/whisper_large-v2_int8_2/encoder \
              --kv_cache_type paged \
              --moe_plugin disable \
              --enable_xqa enable \
              --max_batch_size 16 \
              --gemm_plugin disable \
              --bert_attention_plugin float16 \
              --max_input_len 3000 \
              --max_seq_len 3000 \
              --log_level debug \
              --enable_debug_output

Expected behavior

Engine builds successfully and fast

actual behavior

Whisper Large-V2 encoder hangs when building the engine
this is the last thing printed before it hangs: [11/27/2024-23:00:09] [TRT] [V] Disabling unused tactic source: JIT_CONVOLUTIONS
I tried waiting for around an hour but it still didn't complete and nvidia-smi memory usage for the process was constant

Tiny model builds fine but it's much slower than older versions

additional notes

None

The text was updated successfully, but these errors were encountered:

hello-11 · 2024-12-02T08:23:16Z

@MahmoudAshraf97 Could you use the latest version of TrtLLM?

MahmoudAshraf97 · 2024-12-02T08:55:34Z

I tested 2024111900 and 2024112600,at the time of opening this issue, nothing new was released since

yuekaizhang · 2024-12-03T15:35:52Z

@MahmoudAshraf97 Sorry, just confirmed that it's a bug on H100. Will update here later.

MahmoudAshraf97 · 2024-12-12T09:48:09Z

I tried 0.16.0.dev2024121000 and it's not been solved

yuekaizhang · 2024-12-13T01:02:53Z

@MahmoudAshraf97 Sorry, we're working on the issue. Would you mind using fp16 rather than int8 woq for now?

MahmoudAshraf97 · 2024-12-13T12:07:38Z

@yuekaizhang I'm sticking to v0.15 for now, in the mean time this issue regarding fetching the encoder output from the executor response is also a priority so we can have word timestamps
#2338 (comment)

MahmoudAshraf97 added the bug Something isn't working label Nov 27, 2024

hello-11 assigned yuekaizhang Dec 10, 2024

hello-11 added the triaged Issue has been triaged by maintainers label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't build whisper engines with past two releases #2508

Can't build whisper engines with past two releases #2508

MahmoudAshraf97 commented Nov 27, 2024

hello-11 commented Dec 2, 2024

MahmoudAshraf97 commented Dec 2, 2024

yuekaizhang commented Dec 3, 2024

MahmoudAshraf97 commented Dec 12, 2024

yuekaizhang commented Dec 13, 2024

MahmoudAshraf97 commented Dec 13, 2024

Can't build whisper engines with past two releases #2508

Can't build whisper engines with past two releases #2508

Comments

MahmoudAshraf97 commented Nov 27, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

hello-11 commented Dec 2, 2024

MahmoudAshraf97 commented Dec 2, 2024

yuekaizhang commented Dec 3, 2024

MahmoudAshraf97 commented Dec 12, 2024

yuekaizhang commented Dec 13, 2024

MahmoudAshraf97 commented Dec 13, 2024