In this Project, we perform multimodal sentiment analysis on twitter data comprised of tweets containing both text and images Mohammed, D. J., & Aleqabie, H. J. (2022, September). The Enrichment Of MVSA Twitter Data Via Caption-Generated Label Using Sentiment Analysis. In 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT) (pp. 322-327). IEEE. to predict the sentiment behind the tweets. The sentiment is classified into three different categories: Positive, Neutral and Negative.
- Overview
- Table of Contents
- Datasets
- Model Architecture
- Preprocessing
- Training
- Evaluation
- Usage
- Dependencies
The following dataset has been used for this project : Mohammed, D. J., & Aleqabie, H. J. (2022, September). The Enrichment Of MVSA Twitter Data Via Caption-Generated Label Using Sentiment Analysis. In 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT) (pp. 322-327). IEEE. which can be found here.
The captions are corresponding labels are available in LabeledText.xlsx, feature engineering has been done to add the following feature columns:
- Caption Length : Indicating length of captions
- Hashtags : Extracting and collecting all the hashtags used in each tweet
- Total Hashtags : Showing the total number of hashtags in each tweets
The code to do this is available to run in
Scripts/Text/FeatureEng.py
, the engineered data is then saved as a csv file toData/Text/Engineered.csv
Afterwards, the embeddings are generated for captions and hashtags using TF-IDF approach and BERT.
The code to do this is in
Scripts/Text/Preprocess.py
, the functiontfidf_preprocessing()
and classBERT_Embeddings
is present inCustomFunctions.py
and then the embeddings are saved inData/Text/TF-IDF
andData/Text/BERT
along with target labels and number of captions.
All the dependencies in the project are mentioned in requirements.txt file. To install all dependencies run the following command in your terminal:
pip install -r requirements.txt