UIT-ViSFD consists of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes:
- 📜 SA2SL: From Aspect-Based Sentiment Analysis to Social Listening System for Business Intelligence
- 🔗 Vietnamese Smartphone Feedback Dataset
Hotel Dataset: 7180 reviews (train), 795 reviews (development), 2030 reviews (test)
The data contains user's reviews following two categories: "positive" and "negative"
27068 sentences
- Train: 16087 sentences, Test: 10981 sentences (public: 5454 sentences, private: 5527 sentences)
- Labels: 0 (positive), 1 (negative)
Score: F1 score of negative labels
Author | Model | Score | Paper/Source | Code | |
---|---|---|---|---|---|
Public Test | Private Test | ||||
HoangNhat2 | Weighted Ensemble (TextCNN, VDCNN, HARNN, SARNN) |
0.90087 | 0.90012 | Write up | Official |
iota | SVM | 0.8914 | 0.89688 | Write up | Official |
Nal_AI | SVM (TF-IDF) | 0.89545 | 0.89574 | Write up | Official |
nlpers | Ensemble (LinearSVC, SGD, RandomForest) |
0.88921 | 0.89559 | Write up | Official |
ngxbac | LightGBM (TFIDF) | 0.867 | Write up | Official |
Students’ feedback is a vital resource for the interdisciplinary research involving the combining of two different research fields between sentiment analysis and education. Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications. To assess the quality of our corpus, we measure the annotator agreements and classification evaluation on the UIT-VSFC corpus.
Model | Topic (F1) | Sentiment (F1) | Paper/Source | Code |
---|---|---|---|---|
Bi-LSTM - Word2Vec | 0.896 | 0.92 | Nguyen et al. NICS'18 | |
Maximum Entropy classifier | 0.88 | 0.84 | Nguyen et al. KSE'18 |
Restaurant Dataset: 2961 reviews (train), 1290 reviews (development), 500 reviews (test)
Model | Aspect (F1) | Aspect-Polarity (F1) | Paper/Source | Code |
---|---|---|---|---|
CNNs | 0.80 | Dang et al. NICS'18 | ||
SVM | 0.77 | 0.61 | Dang et al. VLSP'18 | |
SVM | 0.54 | 0.48 | Nguyen et al. VLSP'18 |
Hotel Dataset: 3000 reviews (training), 2000 reviews (development), 600 reviews (test)
Model | Aspect (F1) | Aspect-Polarity (F1) | Paper/Source | Code |
---|---|---|---|---|
SVM | 0.70 | 0.61 | Dang et al. VLSP'18 | |
CNNs | 0.69 | Dang et al. NICS'18 | ||
SVM | 0.56 | 0.53 | Nguyen et al. VLSP'18 |
The data contains user’s reviews about technological device following three categories: ”negative”, ”positive” and ”neutral”
A review can be very complex with different sentiments on various objects. Therefore, we set some constraints on the dataset as follows:
- The dataset only contains reviews having personal opinions.
- The data are usually short comments, containing opinions on one object. There is no limitation on the number of the object's aspects mentioned in the comment.
- Label (positive/negative/neutral) is the overall sentiment of the whole review.
- The dataset contains only real data collected from social media, not artificially created by human.
5100 sentences for training, 1050 sentences for testing
- Train: 1700 positive, 1700 neutral, 1700 negative
- Test: 350 positive, 350 neutral, 350 negative
Model | F1 | Paper/Source | Code |
---|---|---|---|
Perceptron/SVM/Maxent | 80.05 | Pham et al. VLSP'16 | |
SVM/MLNN/LSTM | 71.44 | Nguyen et al. VLSP'16 | |
Ensemble: Random forest, SVM, Naive Bayes | 71.22 | Pham et al. VLSP'16 | |
Ensemble: SVM, LR, LSTM, CNN | 69.71 | Nguyen et al. NICS'18 | |
SVM | 67.54 | Ngo et al. VLSP'16 | |
SVM/MLNN | 67.23 | Tran et al. VLSP'16 | |
SVM/MLNN | 67.23 | Tran et al. VLSP'16 | |
Multi-channel LSTM-CNN | 59.61 | Vo et al. KSE'17 | Official |
📜 Papers
- Huynh et al. NICS'18. Integrating Grammatical Features into CNN Model for Emotion Classification
- Pham et al. 2016, Ngo et al. SoICT'16, Pham et al. KSE'16, Tran et al. 2016
- Kieu et al. KSE'10
📁 Open sources
- VnEmoLex (2017)
data
- polyglot (2014-2017
c++,java,python
- pyurgent (2016)
python,data
- VietSentiWordNet (2014)
data