-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with converting custom encoder model #2535
Comments
@yuekaizhang maybe you can help here? |
Can you check the
I am a little bit confused. Could you explain more? |
the shape is You can see that we named the class |
@AvivSham You're right. We currently hard-coding the WhisperEncoder in some closed sourced CPP codes. Would you mind renaming the CustomEncoder to overwrite the WhisperEncoder? |
Ok, will do that. However, we think it should be flexible and support renaming. We have a follow-up question, we want to pass an additional tensor to the decoder in addition to |
Yeah, however, currently, we have no slot to change it. It would be changed to support more multi-modal models in the future.
Increasing input/output for encoder is easy. You can check https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L188-L215. However, I think for decoder, it is complicated. You may need to modify https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py if there is an extra input for LLM/Whisper based decoder. You may also check https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py to see how to make forward() work first. |
Thanks for the tips @yuekaizhang . Given the questions above how can we debug the shapes in the forward passes? during build trt-llm uses dummy tensors with dynamic dims which are not informative. Additionally, during run we use the compiled computational graph (so we do not have access to tensor shapes). |
You may check https://nvidia.github.io/TensorRT-LLM/performance/perf-best-practices.html#remove-input-padding. See also https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/enc_dec/model.py#L1964. and https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/enc_dec/model.py#L2005 |
What will be affected by setting |
System Info
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
We are trying to convert build and run a custom encoder-decoder model, it differs from Whisper by just a single linear layer.
We followed this guide and created Custom encoder model by adding a single linear layer:
We needed to incorporate a few additional changes.
convert_checkpoint.py
, just for sanity we added these lines to theconvert_checkpoint.py
file in whipser example:In line 246 we added the following lines since the added linear layer is not included in
whisper-v3.pt
file from the exampleconfig.py
file (it does not exist as part of the original repo) to match the new model:__init__.py
file.When trying to run the converted model we get the following error:
However if we make the following changes in new class named WhisperEncoder (by adding the linear layer) it works as expected!
Expected behavior
We expect the converted model to function as WhisperEncoder.
actual behavior
See above. The model has an issue with the input dimension.
additional notes
In the guide attached above some of the steps we made are not documented, and it is not well explained how one should convert and run a custom model with architectural changes (not just weight values).
The text was updated successfully, but these errors were encountered: