This is the official code repository for 'TotallyNotBots'. This ML backed sentiment analysis platform on customer reviews on Amazon products was developed during SIH SRM AP Internal Hackathon.
- Aayusi Biswas
- Tuhin Sarkar
- Vatsal Rathod
- Sarvesh Shroff
- Naveen Edala
- Khushboo Maheshwari
Webportals like Bhuvan get vast amount of feedback from the users. To go through all the feedbacks can be a tedious job. Develop software to categorize opinions expressed in feedback forums. This can be utilized for feedback management system. The software must provide the classification of individual comments/reviews.
The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains).
A web based software that classifies reviews in real time as either a 'Positive' or a 'Negative' review of the product.
- Data Collection The data file is provided as a JSON file from the website itself. Since Bhuvan is a software service, we chose reviews for Amazon Android apps. The data contains approximately 750,000 data points and has the following data columns:
- reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
- asin - ID of the product, e.g. 0000013714
- reviewerName - name of the reviewer
- helpful - helpfulness rating of the review, e.g. 2/3
- reviewText - text of the review
- overall - rating of the product
- summary - summary of the review
- unixReviewTime - time of the review (unix time)
- reviewTime - time of the review (raw)
-
Data Preparation We load the dataset onto a pandas dataset through a JSON parser. Then, the reviews are characterized as either positive or negative based on the rating and a column named 'Sentiment' is added, this will act as the target for training later. We then clean the text by removing stop words and any unnecessary uppercasing or symbols. This data is now ready for training.
-
Model Training The text is vectorized through the TfdifVectorizer module after being pipelined with the help of Pipeline module. After than an extensive GridSearch model is trained and the model then is pickled to a file named 'model.pkl'. On testing the data on around 100,000 data points, an accuracy of 94% is achieved.
-
Prediction over live-feed The webapp takes in reviews and results if it is a positive or a negative feedback.
Backend Dependencies:
- Python
- NLTK
- Scikit-learn
- Numpy/Pandas
- Python Pickle
Frontend Dependancies:
- Flask
- HTML/CSS
Domains:
- Artificial Intelligence [Natural Language Processing]
- Real-time package handling
- Webapp development
Have a look at the Youtube video
- Clone the repository
git clone https://github.com/Aayusi/SihSrm
- Open folder 'webapps'
cd webapps
pip install -r requirements.txt
cd model
-
Run the application
flask run