You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The decode method is a method within the <class GenerationSession(object)> class.
We do not want to obtain the output_ids.
Instead, we hope to get its preceding output, which is the hidden_states output from the last layer of the LLM.
We have tried to extract this parameter, but through debugging, we found that TensorRT-LLM does not have an interface to access the variable hidden_states.
so,How to extract the hidden_states before output_ids during the inference process?
The text was updated successfully, but these errors were encountered:
The inference code we are currently using is as follows:
The decode method is a method within the <class GenerationSession(object)> class.
We do not want to obtain the output_ids.
Instead, we hope to get its preceding output, which is the hidden_states output from the last layer of the LLM.
We have tried to extract this parameter, but through debugging, we found that TensorRT-LLM does not have an interface to access the variable hidden_states.
so,How to extract the hidden_states before output_ids during the inference process?
The text was updated successfully, but these errors were encountered: