How to extract the hidden_states before output_ids during the inference process. #2499

PCJin2 · 2024-11-26T03:11:54Z

The inference code we are currently using is as follows:

for _ in range(run_time):
	self.validate_inputs(input_ids, input_lengths, max_new_tokens)
	self.decoder.setup(
		batch_size=input_lengths.size(0),
		max_context_length=max_input_length,
		max_new_tokens=max_new_tokens,
		beam_width=num_beams,
	)
	output_ids = self.decoder.decode(
		input_ids,
		input_lengths,
		self.sampling_config,
		prompt_table,
		tasks,
		task_vocab_size,
	)
	torch.cuda.synchronize()

The decode method is a method within the <class GenerationSession(object)> class.
We do not want to obtain the output_ids.
Instead, we hope to get its preceding output, which is the hidden_states output from the last layer of the LLM.
We have tried to extract this parameter, but through debugging, we found that TensorRT-LLM does not have an interface to access the variable hidden_states.

so,How to extract the hidden_states before output_ids during the inference process?

The text was updated successfully, but these errors were encountered:

hello-11 · 2024-12-10T06:12:11Z

@PCJin2 you can mark these hidden_states as output.

hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract the hidden_states before output_ids during the inference process. #2499

How to extract the hidden_states before output_ids during the inference process. #2499

PCJin2 commented Nov 26, 2024

hello-11 commented Dec 10, 2024 •

edited

Loading

How to extract the hidden_states before output_ids during the inference process. #2499

How to extract the hidden_states before output_ids during the inference process. #2499

Comments

PCJin2 commented Nov 26, 2024

hello-11 commented Dec 10, 2024 • edited Loading

hello-11 commented Dec 10, 2024 •

edited

Loading