Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract the hidden_states before output_ids during the inference process. #2499

Open
PCJin2 opened this issue Nov 26, 2024 · 1 comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers

Comments

@PCJin2
Copy link

PCJin2 commented Nov 26, 2024

The inference code we are currently using is as follows:

for _ in range(run_time):
	self.validate_inputs(input_ids, input_lengths, max_new_tokens)
	self.decoder.setup(
		batch_size=input_lengths.size(0),
		max_context_length=max_input_length,
		max_new_tokens=max_new_tokens,
		beam_width=num_beams,
	)
	output_ids = self.decoder.decode(
		input_ids,
		input_lengths,
		self.sampling_config,
		prompt_table,
		tasks,
		task_vocab_size,
	)
	torch.cuda.synchronize()

The decode method is a method within the <class GenerationSession(object)> class.
We do not want to obtain the output_ids.
Instead, we hope to get its preceding output, which is the hidden_states output from the last layer of the LLM.
We have tried to extract this parameter, but through debugging, we found that TensorRT-LLM does not have an interface to access the variable hidden_states.

so,How to extract the hidden_states before output_ids during the inference process?

@hello-11 hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 2, 2024
@hello-11
Copy link
Collaborator

hello-11 commented Dec 10, 2024

@PCJin2 you can mark these hidden_states as output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants