Skip to content

indicium15/sc1015-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sc1015-mini-project

About

This is a mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on evaluating the effectiveness of Machine Learning Models for classifying Fake News Headlines.

Contributors

Problem Definition

  • Are we able to apply Natural Language Processing to classify the headline of a news article as being fake or real?
  • Based on the model we have, how can we improve its accuracy and effectiveness?

Model Used

  • LTSM Model
  • Random Forest Tree Classifier

Stages of Analysis

  1. Data Cleaning (Backup Link)
  2. EDA - Generating a WordCloud (Backup Link)
  3. EDA - Using Sentiment Analysis (Backup Link)
  4. Building the Model - RFTC (Backup Link)
  5. Building the Model - LSTM (Backup Link)
  6. Model Evaluation (Backup Link)

Conclusions

  • The LSTM model is a good base for classifying headlines
  • Our model is good at classifying fake news but bad at classifying true news due to the nature of our data and the overlap in linguistic nature of some headlines.
  • There are improvements that can be made to our current model to improve accuracy that we have proposed in our presentation.

Skills Learnt

  • Collaboration using GitHub and Google Collab
  • How to clean text data
  • How to draw insights from text data
  • New evaluation metrics for Binary Classification Models
  • Understanding the working behind LSTM model
  • Understanding why our model is good at classifying fake news and why it is bad at classifying real news
  • Understanding the shortcomings in our train data and how to improve model accuracy

References

Datasets:

  1. Fake and Real News Dataset
  2. Onion or Not?
  3. Buzzfeed Political News Data
  4. ABC News Headlines

EDA

  1. Word Cloud Generation
  2. Sentiment Analysis

Evaluation Metrics:

  1. ROC Curve
  2. Advanced Evaluation Metrics for Binary Classification
  3. Matthews Correlation Coefficient
  4. Introduction to Transfer Learning

About

SC1015 Mini-Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •