Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I set tokenizer.pad_token = tokenizer.eos_token and found tokenizer.pad_token_id==None, which leads to an error. #28

Open
dlutmlt opened this issue May 22, 2024 · 0 comments

Comments

@dlutmlt
Copy link

dlutmlt commented May 22, 2024

python3.7/site-packages/transformers/tokenization_utils_base.py", line 2387, in _get_padding_truncation_strategies if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or self.pad_token_id < 0): TypeError: '<' not supported between instances of 'NoneType' and 'int'

when i debug the code, i find the variable "self.pad_token_id" is None, which leads to an error. But the variable self.pad_token is "<|endoftext|>", which is correct in GPT-2 style.
It seems like there is not a "<|endoftext|>" symbol in the vocab.json file. So i want to know that how the BiomedLM control the stopping of generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant