Skip to content

Performed Sentiment analysis on clothing reviews dataset with nlp techniques while also using ML algorithms to classify clothing reviews as positive or negative. A Flask web app was developed to allow users to input reviews and receive sentiment predictions from a trained model.

Notifications You must be signed in to change notification settings

Fatha27/review-classification-and-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Review classification and sentiment analysis

Table of Contents

  1. Introduction
  2. Dataset
  3. Data Cleaning
  4. Exploratory Data Analysis (EDA)
  5. Data Preprocessing
  6. Modeling and Results
  7. Web App screenshots

1. Introduction

The Clothing Review Sentiment Analysis project is aimed at predicting the sentiment of customer reviews for various clothing items. The goal is to classify reviews as positive or negative based on the content of the review text.

The project leverages Natural Language Processing (NLP) techniques and machine learning models to achieve high accuracy in sentiment classification. This project is valuable for retailers who wish to gain insights into customer satisfaction and improve their products based on feedback.

2. Dataset

Clothing Reviews-Kaggle It includes the following columns:

  • Clothing ID: Unique identifier for each clothing item.
  • Age: Age of the reviewer.
  • Title: Title of the review.
  • Review Text: The actual text of the review.
  • Rating: Rating given by the reviewer (1-5 scale).
  • Recommended IND: Indicator of whether the reviewer recommends the item.
  • Positive Feedback Count: Number of positive feedbacks received by the review.
  • Division Name, Department Name, Class Name: Metadata about the clothing item.

3. Data Cleaning

Prior to conducting exploratory data analysis (EDA), a comprehensive data cleaning process was undertaken to ensure the dataset's quality and integrity. The following steps were applied:

  • Handling Missing Values: All instances of missing data were systematically addressed. Missing entries in critical columns were either imputed with suitable values or removed, depending on the context and impact on downstream analysis.

  • Class Imbalance Management: The dataset exhibited a significant imbalance between the two sentiment classes. To mitigate this, undersampling was employed on the majority class, bringing the dataset to a more balanced state. This step was crucial in ensuring that the models trained on the data were not biased towards the more prevalent class, thereby improving the robustness of the sentiment classification.

  • image

  • Removal of Anomalous Entries: Certain entries were identified as biased or inconsistent, such as instances where a rating of 5 was given, yet the recommendation indicator was 0. These entries were removed to prevent any distortions in the model's learning process, ensuring that the training data accurately reflected the true sentiment of the reviewers.

4. Exploratory Data Analysis (EDA)

Before diving into model building, an extensive exploratory data analysis was conducted utilizing plotly, seaborn and matplotlib. This included:

  • Distribution of Ratings : Visualization of how ratings and recommended class are distributed across the dataset. Here are some of the visualizations:

    • image

    • image

  • Word Clouds: Created word clouds to visualize the most frequent words in positive and negative reviews.

    • Positive reviews image
    • Negative reviews image
  • Correlation Analysis: Checked for correlations between different features and review sentiments.

5. Data Preprocessing

To prepare the data for modeling, the following preprocessing steps were undertaken:

  • Text Cleaning: Removed HTML tags, special characters, and numbers from the review text.
  • Stopword Removal: Common stopwords were removed to reduce noise in the data.
  • Lemmatization: Converted words to their base form using lemmatization to standardize the text.
  • TF-IDF Vectorization: Transformed the text data into numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).

6. Modeling and Results

A variety of machine learning models were tested to classify the reviews, with Logistic regression, SVM and decision tree standing out across different metrics:

image

7. Web app screenshots

Negative review

image

Postive review

image

About

Performed Sentiment analysis on clothing reviews dataset with nlp techniques while also using ML algorithms to classify clothing reviews as positive or negative. A Flask web app was developed to allow users to input reviews and receive sentiment predictions from a trained model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages