Skip to content

AraBERT output embeddings #50

Discussion options

You must be logged in to vote

You have to pass a text string to the encode function, so no need to tokenized first it will do it internally

from arabert.preprocess import ArabertPreprocessor
from transformers import AutoTokenizer, AutoModel

arabert_tokenizer = AutoTokenizer.from_pretrained("aubmindlab/bert-base-arabertv2",do_lower_case=False)
arabert_model = AutoModel.from_pretrained("aubmindlab/bert-base-arabertv2")
model_name = "bert-base-arabertv2"
arabert_prep = ArabertPreprocessor(model_name=model_name, keep_emojis=False)

text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
text_preprocessed= arabert_prep.preprocess(text)
print(text_preprocessed)
>>>و+ لن نبالغ إذا قل +نا إن هاتف أو كمبيوتر ال+

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by WissamAntoun
Comment options

You must be logged in to vote
1 reply
@WissamAntoun
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants