This is a mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on evaluating the effectiveness of Machine Learning Models for classifying Fake News Headlines.
- @hyunsunryu2020: Hyunsun Ryu - Data Cleaning and EDA
- @pravindkk: Pavind Kumar - Model Building
- @indicium15: Chaitanya Jadhav - Model Evaluation
- Are we able to apply Natural Language Processing to classify the headline of a news article as being fake or real?
- Based on the model we have, how can we improve its accuracy and effectiveness?
- LTSM Model
- Random Forest Tree Classifier
- Data Cleaning (Backup Link)
- EDA - Generating a WordCloud (Backup Link)
- EDA - Using Sentiment Analysis (Backup Link)
- Building the Model - RFTC (Backup Link)
- Building the Model - LSTM (Backup Link)
- Model Evaluation (Backup Link)
- The LSTM model is a good base for classifying headlines
- Our model is good at classifying fake news but bad at classifying true news due to the nature of our data and the overlap in linguistic nature of some headlines.
- There are improvements that can be made to our current model to improve accuracy that we have proposed in our presentation.
- Collaboration using GitHub and Google Collab
- How to clean text data
- How to draw insights from text data
- New evaluation metrics for Binary Classification Models
- Understanding the working behind LSTM model
- Understanding why our model is good at classifying fake news and why it is bad at classifying real news
- Understanding the shortcomings in our train data and how to improve model accuracy