You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there!
I had a few questions & suggestions regarding HF_LLM.py. Sorry in advance for being naive.
For decoder-only and pre_encode_input=True case, the tokenized_contexts=batch_inputs argument to custom module calls
lacks the last token of the context which has been prepended to the output. Is this really desired? Couldn't be confusing for the user?
Since it seems that we could not give adhoc model config parameters to model_cls.from_config, but to model_cls.from_pretrained or we could have insteadconfig = config_cls.from_pretrained(path, **adhoc);model_cls.from_config(config). To reproduce:
For decoder-only and pre_encode_input=True case, the tokenized_contexts=batch_inputs argument to custom module calls
lacks the last token of the context which has been prepended to the output. Is this really desired? Couldn't be confusing for the user?
Yes, when using pre_encode_input=True with decoder-only models, the input is first given to the LLM and the past_key_values is obtained. However, transformers only returns the hidden_states for inputs, not for the past_key_values. So to get the hidden states from the input's last token, this token should be removed from the inputs used when pre-encoding.
I agree that this may be confusing. This is among a long list of things that aren't documented. I unfortunately have a very limited bandwith to work on improving the documentation :/
Why don't we use bos_token here while creating decoder input in encoder-decoder setup?
The bos_token is not provided for all tokens. When I first implemented the hf_llm.py, I was mostly using T5 models which do not have any bos_token. This said, I had to force the pad token to 0 as a pad_token is also not implemented for all models...
I don't know if there is any clean universal solution. Nevertheless, it seems the token 0 is often used for padding.
To batchify generation:
Yes, this needs to be done :)
We could add output_hidden_states not as a tensor:
Right. We would have to check the transformers API handles a boolean.
When pretrained=False and use_cpu=False, HF_LLM.__init__ raises error in:
I need to test this. I am not sure when I can do it though.
Hi there!
I had a few questions & suggestions regarding
HF_LLM.py
. Sorry in advance for being naive.tokenized_contexts=batch_inputs
argument to custom module callslacks the last token of the context which has been prepended to the output. Is this really desired? Couldn't be confusing for the user?
lamorel/lamorel/src/lamorel/server/llms/hf_llm.py
Lines 346 to 352 in c82d1b1
bos_token
here while creating decoder input in encoder-decoder setup?lamorel/lamorel/src/lamorel/server/llms/hf_llm.py
Line 167 in c82d1b1
lamorel/lamorel/src/lamorel/server/llms/hf_llm.py
Lines 205 to 207 in c82d1b1
output_hidden_states
not as a tensor:lamorel/lamorel/src/lamorel/server/llms/hf_llm.py
Lines 327 to 337 in c82d1b1
pretrained=False
anduse_cpu=False
,HF_LLM.__init__
raises error in:lamorel/lamorel/src/lamorel/server/llms/hf_llm.py
Line 61 in c82d1b1
Since it seems that we could not give adhoc model config parameters to
model_cls.from_config
, but tomodel_cls.from_pretrained
or we could have insteadconfig = config_cls.from_pretrained(path, **adhoc);model_cls.from_config(config)
. To reproduce:The text was updated successfully, but these errors were encountered: