Review classification and sentiment analysis

2. Dataset

Clothing Reviews-Kaggle It includes the following columns:

Clothing ID: Unique identifier for each clothing item.
Age: Age of the reviewer.
Title: Title of the review.
Review Text: The actual text of the review.
Rating: Rating given by the reviewer (1-5 scale).
Recommended IND: Indicator of whether the reviewer recommends the item.
Positive Feedback Count: Number of positive feedbacks received by the review.
Division Name, Department Name, Class Name: Metadata about the clothing item.

3. Data Cleaning

Prior to conducting exploratory data analysis (EDA), a comprehensive data cleaning process was undertaken to ensure the dataset's quality and integrity. The following steps were applied:

Handling Missing Values: All instances of missing data were systematically addressed. Missing entries in critical columns were either imputed with suitable values or removed, depending on the context and impact on downstream analysis.
Class Imbalance Management: The dataset exhibited a significant imbalance between the two sentiment classes. To mitigate this, undersampling was employed on the majority class, bringing the dataset to a more balanced state. This step was crucial in ensuring that the models trained on the data were not biased towards the more prevalent class, thereby improving the robustness of the sentiment classification.
Removal of Anomalous Entries: Certain entries were identified as biased or inconsistent, such as instances where a rating of 5 was given, yet the recommendation indicator was 0. These entries were removed to prevent any distortions in the model's learning process, ensuring that the training data accurately reflected the true sentiment of the reviewers.

4. Exploratory Data Analysis (EDA)

Before diving into model building, an extensive exploratory data analysis was conducted utilizing plotly, seaborn and matplotlib. This included:

Distribution of Ratings : Visualization of how ratings and recommended class are distributed across the dataset. Here are some of the visualizations:
Word Clouds: Created word clouds to visualize the most frequent words in positive and negative reviews.
- Positive reviews
- Negative reviews
Correlation Analysis: Checked for correlations between different features and review sentiments.

5. Data Preprocessing

To prepare the data for modeling, the following preprocessing steps were undertaken:

Text Cleaning: Removed HTML tags, special characters, and numbers from the review text.
Stopword Removal: Common stopwords were removed to reduce noise in the data.
Lemmatization: Converted words to their base form using lemmatization to standardize the text.
TF-IDF Vectorization: Transformed the text data into numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).

6. Modeling and Results

A variety of machine learning models were tested to classify the reviews, with Logistic regression, SVM and decision tree standing out across different metrics:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
__pycache__		__pycache__
models		models
templates		templates
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review classification and sentiment analysis

Table of Contents

1. Introduction

2. Dataset

3. Data Cleaning

4. Exploratory Data Analysis (EDA)

5. Data Preprocessing

6. Modeling and Results

7. Web app screenshots

Negative review

Postive review

About

Releases

Packages

Languages

Fatha27/review-classification-and-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Review classification and sentiment analysis

Table of Contents

1. Introduction

2. Dataset

3. Data Cleaning

4. Exploratory Data Analysis (EDA)

5. Data Preprocessing

6. Modeling and Results

7. Web app screenshots

Negative review

Postive review

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages