Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't build whisper engines with past two releases #2508

Open
2 of 4 tasks
MahmoudAshraf97 opened this issue Nov 27, 2024 · 6 comments
Open
2 of 4 tasks

Can't build whisper engines with past two releases #2508

MahmoudAshraf97 opened this issue Nov 27, 2024 · 6 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@MahmoudAshraf97
Copy link
Contributor

System Info

  • CPU architecture: x86_64
  • CPU/Host memory size: 512GB
  • GPU properties
    • GPU name: 2xNVIDIA H100
  • Libraries
    • TensorRT-LLM branch or tag: v0.16.0dev2024111900 or higher
    • Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used: the ones in the requirements

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

trtllm-build  --checkpoint_dir trt/whisper_large-v2_weights_int8/encoder \
              --output_dir trt/whisper_large-v2_int8_2/encoder \
              --kv_cache_type paged \
              --moe_plugin disable \
              --enable_xqa enable \
              --max_batch_size 16 \
              --gemm_plugin disable \
              --bert_attention_plugin float16 \
              --max_input_len 3000 \
              --max_seq_len 3000 \
              --log_level debug \
              --enable_debug_output

Expected behavior

Engine builds successfully and fast

actual behavior

Whisper Large-V2 encoder hangs when building the engine
this is the last thing printed before it hangs: [11/27/2024-23:00:09] [TRT] [V] Disabling unused tactic source: JIT_CONVOLUTIONS
I tried waiting for around an hour but it still didn't complete and nvidia-smi memory usage for the process was constant

Tiny model builds fine but it's much slower than older versions

additional notes

None

@MahmoudAshraf97 MahmoudAshraf97 added the bug Something isn't working label Nov 27, 2024
@hello-11
Copy link
Collaborator

hello-11 commented Dec 2, 2024

@MahmoudAshraf97 Could you use the latest version of TrtLLM?

@MahmoudAshraf97
Copy link
Contributor Author

I tested 2024111900 and 2024112600,at the time of opening this issue, nothing new was released since

@yuekaizhang
Copy link

@MahmoudAshraf97 Sorry, just confirmed that it's a bug on H100. Will update here later.

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Dec 10, 2024
@MahmoudAshraf97
Copy link
Contributor Author

I tried 0.16.0.dev2024121000 and it's not been solved

@yuekaizhang
Copy link

@MahmoudAshraf97 Sorry, we're working on the issue. Would you mind using fp16 rather than int8 woq for now?

@MahmoudAshraf97
Copy link
Contributor Author

@yuekaizhang I'm sticking to v0.15 for now, in the mean time this issue regarding fetching the encoder output from the executor response is also a priority so we can have word timestamps
#2338 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants