Skip to content

MeLLL-UFF/embeddings-tweets-pt-br-lrev

Repository files navigation

Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models

During the past years, we have seen a steady increase in the number of social networks worldwide. Among them, Twitter has consolidated its position as one of the most influential social platforms, with Brazilian Portuguese speakers holding the fifth position in the number of users. Due to the informal linguistic style of tweets, the discovery of information in such an environment poses a challenge to Natural Language Processing (NLP) tasks such as sentiment analysis. In this work, we state sentiment analysis as a binary (positive and negative) and multiclass (positive, negative, and neutral) classification task at the Portuguese-written tweet level. Following a feature extraction approach, embeddings are initially gathered for a tweet and then given as input to learn a classifier. This study was designed to evaluate the effectiveness of different word representations, from the original pre-trained language model to continued pre-training strategies, in terms of improvements in the predictive performance of sentiment classification, using three different classifier algorithms and eight Portuguese tweets datasets. Because of the lack of a language model specific to Brazilian Portuguese tweets, we have expanded our evaluation to consider six different embeddings: fastText, GloVe, word2vec, BERT-multilingual (mBERT), BERTweet, and BERTimbau. The conducted experiments showed that embeddings trained from scratch solely using the target Portuguese language, BERTimbau, outperform the static representations, fastText, GloVe, and word2vec, and the Transformer-based models BERT multilingual and BERTweet. In addition, we show that extracting the contextualized embedding without any adjustment to the pre-trained language model is the best approach for most of the datasets.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published