Sentiment Analysis on clothing's review dataset using transformer models (BERT)

Dataset informations:

Clothing ID: A unique identifier for each clothing item.
Age: The age of the reviewer.
Title: The title of the review provided by the reviewer.
Review Text: The detailed text of the review where the reviewer expresses their thoughts and opinions about the product.
Rating: The rating given by the reviewer, typically on a scale from 1 to 5 stars.
Recommended IND: An indicator (usually binary) that suggests whether the reviewer recommended the product (e.g., 1 for recommended, 0 for not recommended).
Positive Feedback Count: The count of positive feedback or "likes" received on the review by other users.
Division Name: The name of the product division or category.
Department Name: The name of the department within the division to which the product belongs.
Class Name: The specific class or category to which the product belongs.

Project Goals

Perform data cleaning and exploratory data analysis (EDA) on the dataset to uncover insights from product reviews.
Test different types of BERT models from Hugging Face with varying output classes. This step involves experimenting with pretrained models to evaluate their performance on the dataset without fine-tuning.
Make a decision on the number of output classes (2 classes, 3 classes or 5 classes) and the type of BERT model to use (BERT, roBERTa or distilBERT) for the final sentiment analysis model
Fine-tune the dataset after deciding on how many output classes and which type of BERT model to use for the final model.

Data cleaning

Removed null values
Removed duplicated rows

EDA (Exploratory Data Analysis)

Answering the following questions

1. How many product reviews per rating?

From the graph, it shows that the ratings are increasing in a linear trend.
While there is some imbalance, it doesn't seem extreme, as there is a reasonable spread of ratings across all values from 1 to 5 stars.

2. What is the number of reviews per product category?

From this dataset, it seems that dresses, knits and blouses are among the highest reviewed products.

3. What is the distribution of reviews per rating per category?

From the plot, all the products are fairly balanced with its distribution of ratings.

4. Ploting a word cloud for the product reviews

5. What is the distribution of words per review?

The max_length is 115 (This information allows us to figure out the max_length of the product reviews, it is crucial for the fine-tuning process)

Types of BERT models used in this project:

Feature	BERT	RoBERTa	DistilBERT
Training Objectives	MLM (Masked Language Model), NSP	MLM (Masked Language Model)	MLM (Masked Language Model)
Data Preprocessing	Random Masking	Dynamic Masking, No NSP	Random Masking, Pruning Attention
Next Sentence Prediction (NSP)	Yes	No	No
Training Duration	Extended	Longer, Larger Dataset	Shorter, Pruned Layers
Sentence Embeddings	[CLS] Token	No [CLS] Token for Sentence Tasks	[CLS] Token
Batch Training	Fixed Batch Size	Dynamic Batch Size	Smaller Model Size
Model Size	Large	Larger	Smaller
Number of Layers	Configurable, Typically 12 or 24	Configurable, Typically 12 or 24	Reduced (Distilled), Typically 6
Performance	Benchmark Model	Improved Performance on Tasks	Trade-Off between Size and Quality

Loading pretrained BERT models from huggingface and performing inference with the models.

Testing different types of pretrained BERT models with different output classes without fine-tuning the models to our dataset:

BERT model: Pretrained on 5 output classes (1 star to 5 star) - Link to the model
roBERTa model: Pretrained on 3 output classes (0 : Negative, 1 : Neutral, 2 : Positive) - Link to the model
distilBERT model: Pretrained on 2 output classes (0 : Negative, 1 : Positive) - Link to the model

Results of the pretrained models

Models	Accuracy	Precision	Recall	F1 Score
pretrained BERT model (5 output classes)	0.566	0.653	0.566	0.592
pretrained roBERTa model (3 output classes)	0.793	0.771	0.793	0.776
pretrained distilBERT (2 output classes)	0.837	0.850	0.837	0.842

As expected, the lower output classes will have an easier learning pattern. Hence higher accuracy.
The BERT model with 5 output classes performed the worst due to its narrow sentiment range.
Even without fine-tuning, the distilBERT already achieved a pretty good accuracy of 84% with 2 output classes
In comparison, the roBERTa model also achieved a relatively high accuracy of 79% eventhough it is predicting 3 classes, only a 0.5% difference between roBERTA and distilBERT. This is also expected, because roBERTa has the largest number of parameters among the three.

Decision on how many output classes to use

Reasons for choosing 3 output classes:

Multi-class Classification (5 classes): Avoided due to the dataset's narrow sentiment ranges, requiring a larger dataset for effective capture.
Binary Classification (2 classes): Not chosen as the dataset's rating distribution is relatively balanced; binary classification risks oversimplifying and losing information.
Chose 3 output classes to distinguish between positive, negative, and neutral sentiments, providing richer insights.

Decision on which type of BERT model to use

For this project, the choice will be distilBERT over BERT and roBERTa because distilBERT has a faster performance in both training and inference times. DistilBERT's smaller size and streamlined architecture contribute to quicker computations, ensuring computational efficiency throughout the model's lifecycle.
Link to the base-distilBERT model

Refer to the fine_tuning notebook for all the steps of the fine-tuning process

The model was fine-tuned in a google colab environment (utilizing a GPU)
The fine-tuned model was trained on the clothing dataset
The fine-tuned model was evaluated using a test dataset that has been split from the train_test_split process
The fine-tuned model was compared with the pretrained roBERTa model with 3 output classes.

Training arguments used for fine-tuning:

   training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=10,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    learning_rate=5e-5,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
  )

Result comparison between fine-tuned distilBERT model and pretrained models

Model	Accuracy	Precision	Recall	F1 Score
pretrained roBERTa (3 classes)	0.789	0.772	0.789	0.773
pretrained distilBERT (2 classes)	0.837	0.850	0.837	0.842
Fine-tuned distilBERT model (3 classes)	0.849	0.860	0.849	0.853

Pretrained model used: cardiffnlp/twitter-roberta-base-sentiment-latest
Fine-tuned base model: distilbert-base-uncased
From the table, the fine-tuned distilBERT model showed a slight performance improvement compared to the pretrained distilBERT with 2 output classes. This improvement is noteworthy, considering the expectation that having 3 output classes could potentially lead to a lower accuracy. The fine-tuning process allows the model to adapt more closely to the nuances of the specific sentiment analysis task, resulting in enhanced performance.
Also, the pretrained roBERTa model demonstrates competitive performance, closely trailing the fine-tuned distilBERT, even without undergoing the fine-tuning process. This result aligns with the expectation that roBERTa, with its larger number of parameters and advanced architecture, has the potential for strong out-of-the-box performance.
Thus, fine-tuning the roBERTa model could present an opportunity to surpass the performance of the fine-tuned distilBERT.

Confusion matrix for the fine-tuned model:

Future improvements:

Continue Fine-Tuning the distilBERT model with different parameters to achieve a higher accuracy
Fine-tune a roBERTa base model to the dataset and compare it with distilBERT

If you are interested in the fine-tuned distilBERT model:

Link to the fine-tuned model: https://huggingface.co/ongaunjie/distilbert-cloths-sentiment

1) Try out the model by inputting a sentence

Example input: "this dress is kinda okay"

The output labels are as follows:

0 - Negative
1 - Neutral
2 - Positive

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
predicted		predicted
test data and label		test data and label
README.md		README.md
cleaned.csv		cleaned.csv
fine_tuning(distilbert).ipynb		fine_tuning(distilbert).ipynb
requirements.txt		requirements.txt
sentiment.ipynb		sentiment.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on clothing's review dataset using transformer models (BERT)

Project Goals

Data cleaning

EDA (Exploratory Data Analysis)

1. How many product reviews per rating?

2. What is the number of reviews per product category?

3. What is the distribution of reviews per rating per category?

4. Ploting a word cloud for the product reviews

5. What is the distribution of words per review?

Types of BERT models used in this project:

Loading pretrained BERT models from huggingface and performing inference with the models.

Results of the pretrained models

Decision on how many output classes to use

Decision on which type of BERT model to use

Refer to the fine_tuning notebook for all the steps of the fine-tuning process

Training arguments used for fine-tuning:

Result comparison between fine-tuned distilBERT model and pretrained models

Confusion matrix for the fine-tuned model:

Future improvements:

If you are interested in the fine-tuned distilBERT model:

Link to the fine-tuned model: https://huggingface.co/ongaunjie/distilbert-cloths-sentiment

1) Try out the model by inputting a sentence

The output labels are as follows:

2) Perform inference on the model, more details can be found in the huggginface's repository**

About

Releases

Packages

Languages

ongaunjie1/sentiment-analysis-BERT-tuning

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on clothing's review dataset using transformer models (BERT)

Project Goals

Data cleaning

EDA (Exploratory Data Analysis)

1. How many product reviews per rating?

2. What is the number of reviews per product category?

3. What is the distribution of reviews per rating per category?

4. Ploting a word cloud for the product reviews

5. What is the distribution of words per review?

Types of BERT models used in this project:

Loading pretrained BERT models from huggingface and performing inference with the models.

Results of the pretrained models

Decision on how many output classes to use

Decision on which type of BERT model to use

Refer to the fine_tuning notebook for all the steps of the fine-tuning process

Training arguments used for fine-tuning:

Result comparison between fine-tuned distilBERT model and pretrained models

Confusion matrix for the fine-tuned model:

Future improvements:

If you are interested in the fine-tuned distilBERT model:

Link to the fine-tuned model: https://huggingface.co/ongaunjie/distilbert-cloths-sentiment

1) Try out the model by inputting a sentence

The output labels are as follows:

2) Perform inference on the model, more details can be found in the huggginface's repository**

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages