-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Running custom Encoder Decoder model #2491
Comments
Thank you for your response @hello-11. class CustomEncoder(WhisperEncoder):
def __init__(self, config: PretrainedConfig):
super().__init__(config)
self.lin = Linear(in_features=1280, out_features=1280)
def forward(self,
input_features: Tensor,
input_lengths=None,
position_ids=None):
if default_net().plugin_config.remove_input_padding:
# BXT,D -> 1,BxT,D -> 1,D,BxT
input_features = unsqueeze(input_features, 0)
input_features = transpose(input_features, 1, 2)
# Encoder conv needs to run in fp32 on Volta/Turing
x_type = input_features.dtype
input_features = cast(input_features, self._conv_dtype)
x = self.conv1(input_features)
x = gelu(x)
x = self.conv2(x)
x = cast(x, x_type)
x = gelu(x)
x = transpose(x, 2, 1)
x = x + cast(self.position_embedding(position_ids), x.dtype)
if default_net().plugin_config.remove_input_padding:
#B,T,D -> BxT,D
x = x.view([-1, self.config.hidden_size])
hidden_states = x
input_lengths = input_lengths // self.downsample_factor
for encoder_layer in self.encoder_layers:
hidden_states = encoder_layer(hidden_states,
input_lengths=input_lengths)
x = hidden_states
x = self.lin(x)
x = self.ln_post(x)
x.mark_output('encoder_output', self._dtype)
return x we also wrote a new weights['lin.weight'] = torch.rand(1280, 1280).contiguous()
weights['lin.bias'] = torch.rand(1280).contiguous() when running: trtllm-build --checkpoint_dir ${checkpoint_dir}/encoder \
--output_dir ${output_dir}/encoder \
--moe_plugin disable \
--enable_xqa disable \
--max_batch_size ${MAX_BATCH_SIZE} \
--gemm_plugin disable \
--bert_attention_plugin ${INFERENCE_PRECISION} \
--max_input_len 3000 --max_seq_len=3000 we receive the following error:
After deep dive is seems like the
I assume it relates to this:
Can you please advice how to solve this issue? |
@AvivSham, did you convert the checkpoint first? |
@AvivSham Please use |
@yuekaizhang Thanks |
Hi All,
Thank you for your amazing work.
We have an encoder decoder model we want to run using TensorRT-LLM. We made an architectural modification by pooling the encoder's output dim using stacked MLP layers.
What is the recommended way of modifying the code to support the new architecture? We assume that we need to change the code to convert the model (to a static computation graph) and run it.
Please advice,
The text was updated successfully, but these errors were encountered: