Skip to content

Latest commit

 

History

History
301 lines (259 loc) · 10.6 KB

sentiment_analysis.md

File metadata and controls

301 lines (259 loc) · 10.6 KB

Sentiment Analysis

UIT-ViSFD: A Vietnamese Smartphone Feedback Dataset for Aspect-Based Sentiment Analysis

UIT-ViSFD consists of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes:

UIT ABSA Datasets

Hotel Dataset: 7180 reviews (train), 795 reviews (development), 2030 reviews (test)

AIVIVN 2019: Sentiment Analysis Challenge

The data contains user's reviews following two categories: "positive" and "negative"

27068 sentences

  • Train: 16087 sentences, Test: 10981 sentences (public: 5454 sentences, private: 5527 sentences)
  • Labels: 0 (positive), 1 (negative)

Leaderboard

Score: F1 score of negative labels

Author Model Score Paper/Source Code
Public Test Private Test
HoangNhat2 Weighted Ensemble
(TextCNN, VDCNN, HARNN, SARNN)
0.90087 0.90012 Write up Official
iota SVM 0.8914 0.89688 Write up Official
Nal_AI SVM (TF-IDF) 0.89545 0.89574 Write up Official
nlpers Ensemble
(LinearSVC, SGD, RandomForest)
0.88921 0.89559 Write up Official
ngxbac LightGBM (TFIDF) 0.867 Write up Official

Vietnamese Students’ Feedback Corpus (UIT-VSFC)

Students’ feedback is a vital resource for the interdisciplinary research involving the combining of two different research fields between sentiment analysis and education. Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications. To assess the quality of our corpus, we measure the annotator agreements and classification evaluation on the UIT-VSFC corpus.

Leaderboard

Model Topic (F1) Sentiment (F1) Paper/Source Code
Bi-LSTM - Word2Vec 0.896 0.92 Nguyen et al. NICS'18
Maximum Entropy classifier 0.88 0.84 Nguyen et al. KSE'18

VLSP 2018 Shared Task: Aspect Based Sentiment Analysis

Leaderboard

Restaurant Dataset: 2961 reviews (train), 1290 reviews (development), 500 reviews (test)

Model Aspect (F1) Aspect-Polarity (F1) Paper/Source Code
CNNs 0.80 Dang et al. NICS'18
SVM 0.77 0.61 Dang et al. VLSP'18
SVM 0.54 0.48 Nguyen et al. VLSP'18

Hotel Dataset: 3000 reviews (training), 2000 reviews (development), 600 reviews (test)

Model Aspect (F1) Aspect-Polarity (F1) Paper/Source Code
SVM 0.70 0.61 Dang et al. VLSP'18
CNNs 0.69 Dang et al. NICS'18
SVM 0.56 0.53 Nguyen et al. VLSP'18

VLSP 2016 Shared Task: Sentiment Analysis

The data contains user’s reviews about technological device following three categories: ”negative”, ”positive” and ”neutral”

A review can be very complex with different sentiments on various objects. Therefore, we set some constraints on the dataset as follows:

  • The dataset only contains reviews having personal opinions.
  • The data are usually short comments, containing opinions on one object. There is no limitation on the number of the object's aspects mentioned in the comment.
  • Label (positive/negative/neutral) is the overall sentiment of the whole review.
  • The dataset contains only real data collected from social media, not artificially created by human.

5100 sentences for training, 1050 sentences for testing

  • Train: 1700 positive, 1700 neutral, 1700 negative
  • Test: 350 positive, 350 neutral, 350 negative

Leaderboard

Model F1 Paper/Source Code
Perceptron/SVM/Maxent 80.05 Pham et al. VLSP'16
SVM/MLNN/LSTM 71.44 Nguyen et al. VLSP'16
Ensemble: Random forest, SVM, Naive Bayes 71.22 Pham et al. VLSP'16
Ensemble: SVM, LR, LSTM, CNN 69.71 Nguyen et al. NICS'18
SVM 67.54 Ngo et al. VLSP'16
SVM/MLNN 67.23 Tran et al. VLSP'16
SVM/MLNN 67.23 Tran et al. VLSP'16
Multi-channel LSTM-CNN 59.61 Vo et al. KSE'17 Official

Miscellaneous

📜 Papers

📁 Open sources