In this repository, I am going to perform sentiment analysis on food reviews dataset step by step.
- python (https://www.python.org/downloads/)
- sklearn (pip install sklearn)
- pysttsx3 (pip install pysttsx3)
- numpy (pip install numpy)
- matplotlib (pip install matplotlib)
- Download the code from github
- Download all above mentioned dependencies.
- Open downloaded folder inside jupyter notebook.
- Now cells as per your requirements.
Here you need to import all the required libraries. I have used pysttx3 for text to speech conversion. You can skip this part if you want to show prediction in text format only.
Note: You can find dataset inside github repository link (given above). In repository, you will find a file named "Restaurant_Reviews.tsv"
Now, we have to split data into training and testing dataset. So that, we can train model on raining data and check its accuracy on unseen data.
Now, we have to convert text data into numeric form. I have used CountVectorizer() for this.
As we have done the required data preprocessing, now it is the time to train mode. Here, I will use two different model and I will compare the performance of these two model. Eventually, I will pick best fit model.
Now it is the time to compare performance of Logistic Regression and Naive Bayes Algorithms on given dataset.
As we can see in the result above, Logistic Regression performed comparatively well. So we can continue with Logistic Regression Model.
You can find best value for parameters using Grid Search CV method.
It is clear now that LR model is best fit model for that particular data. However, parameter tuning can be done in order to import its score. Now we will export model and tokenizer to pkl files and then we will deploy this model to a small desktop application.
In this step, I have created a simple desktop application to predict food reviews. I have used pyttsx3 model to convert text prediction into voice.