Missing model_max_length in roberta config #73

ThewBear · 2022-07-02T13:37:58Z

When loaded with transformers.AutoTokenizer.from_pretrained, the model_max_len was set to 1000000000000000019884624838656.

This results in IndexError: index out of range in self when using with flair in the code below.

from flair.embeddings import TransformerDocumentEmbeddings

wangchanberta = TransformerDocumentEmbeddings('airesearch/wangchanberta-base-att-spm-uncased')
wangchanberta .embed(sentence)

After searching, I found this issue huggingface/transformers#14315 (comment) and it stated that model_max_length is missing from the configuration file.

My current workaround is manually calling the following code to overrides the missing config.

wangchanberta.tokenizer.model_max_length = 510

The text was updated successfully, but these errors were encountered:

ThewBear changed the title ~~Add model_max_length to reberta config~~ Add model_max_length to roberta config Jul 2, 2022

ThewBear changed the title ~~Add model_max_length to roberta config~~ Missing model_max_length in roberta config Jul 2, 2022

lalital added the bug Something isn't working label Sep 24, 2022

lalital self-assigned this Sep 24, 2022

lalital pinned this issue Sep 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing model_max_length in roberta config #73

Missing model_max_length in roberta config #73

ThewBear commented Jul 2, 2022

Missing model_max_length in roberta config #73

Missing model_max_length in roberta config #73

Comments

ThewBear commented Jul 2, 2022