Skip to content
forked from Aayusi/SihSrm

This is the official code repository for 'TotallyNotBots'. This ML backed sentiment analysis platform on customer reviews on Amazon products was developed during SIH SRM AP Internal Hackathon.

Notifications You must be signed in to change notification settings

7uhinn/SIH-ISRO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis from Text Feedback

This is the official code repository for 'TotallyNotBots'. This ML backed sentiment analysis platform on customer reviews on Amazon products was developed during SIH SRM AP Internal Hackathon.

TotallyNotBots

  • Aayusi Biswas
  • Tuhin Sarkar
  • Vatsal Rathod
  • Sarvesh Shroff
  • Naveen Edala
  • Khushboo Maheshwari

Overview

Problem Statement: NM396-ISRO

Sentiment Analysis from text feedback:

Webportals like Bhuvan get vast amount of feedback from the users. To go through all the feedbacks can be a tedious job. Develop software to categorize opinions expressed in feedback forums. This can be utilized for feedback management system. The software must provide the classification of individual comments/reviews.

Dataset:

The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains).

Solution:

A web based software that classifies reviews in real time as either a 'Positive' or a 'Negative' review of the product.

  • Data Collection The data file is provided as a JSON file from the website itself. Since Bhuvan is a software service, we chose reviews for Amazon Android apps. The data contains approximately 750,000 data points and has the following data columns:
  1. reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
  2. asin - ID of the product, e.g. 0000013714
  3. reviewerName - name of the reviewer
  4. helpful - helpfulness rating of the review, e.g. 2/3
  5. reviewText - text of the review
  6. overall - rating of the product
  7. summary - summary of the review
  8. unixReviewTime - time of the review (unix time)
  9. reviewTime - time of the review (raw)
  • Data Preparation We load the dataset onto a pandas dataset through a JSON parser. Then, the reviews are characterized as either positive or negative based on the rating and a column named 'Sentiment' is added, this will act as the target for training later. We then clean the text by removing stop words and any unnecessary uppercasing or symbols. This data is now ready for training.

  • Model Training The text is vectorized through the TfdifVectorizer module after being pipelined with the help of Pipeline module. After than an extensive GridSearch model is trained and the model then is pickled to a file named 'model.pkl'. On testing the data on around 100,000 data points, an accuracy of 94% is achieved.

  • Prediction over live-feed The webapp takes in reviews and results if it is a positive or a negative feedback.

Technology used

Backend Dependencies:

  • Python
  • NLTK
  • Scikit-learn
  • Numpy/Pandas
  • Python Pickle

Frontend Dependancies:

  • Flask
  • HTML/CSS

Domains:

  • Artificial Intelligence [Natural Language Processing]
  • Real-time package handling
  • Webapp development

Screenshots/Demo Video

Nice

Have a look at the Youtube video

Usage

  1. Clone the repository

git clone https://github.com/Aayusi/SihSrm

  1. Open folder 'webapps'

cd webapps

pip install -r requirements.txt

cd model

  1. Download 'model'

  2. Run the application flask run

About

This is the official code repository for 'TotallyNotBots'. This ML backed sentiment analysis platform on customer reviews on Amazon products was developed during SIH SRM AP Internal Hackathon.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.8%
  • Python 4.4%
  • HTML 1.8%