Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing model_max_length in roberta config #73

Open
ThewBear opened this issue Jul 2, 2022 · 0 comments
Open

Missing model_max_length in roberta config #73

ThewBear opened this issue Jul 2, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ThewBear
Copy link

ThewBear commented Jul 2, 2022

When loaded with transformers.AutoTokenizer.from_pretrained, the model_max_len was set to 1000000000000000019884624838656.

This results in IndexError: index out of range in self when using with flair in the code below.

from flair.embeddings import TransformerDocumentEmbeddings

wangchanberta = TransformerDocumentEmbeddings('airesearch/wangchanberta-base-att-spm-uncased')
wangchanberta .embed(sentence)

After searching, I found this issue huggingface/transformers#14315 (comment) and it stated that model_max_length is missing from the configuration file.

My current workaround is manually calling the following code to overrides the missing config.

wangchanberta.tokenizer.model_max_length = 510
@ThewBear ThewBear changed the title Add model_max_length to reberta config Add model_max_length to roberta config Jul 2, 2022
@ThewBear ThewBear changed the title Add model_max_length to roberta config Missing model_max_length in roberta config Jul 2, 2022
@lalital lalital added the bug Something isn't working label Sep 24, 2022
@lalital lalital self-assigned this Sep 24, 2022
@lalital lalital pinned this issue Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants