sc1015-mini-project

About

This is a mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on evaluating the effectiveness of Machine Learning Models for classifying Fake News Headlines.

Contributors

@hyunsunryu2020: Hyunsun Ryu - Data Cleaning and EDA
@pravindkk: Pavind Kumar - Model Building
@indicium15: Chaitanya Jadhav - Model Evaluation

Problem Definition

Are we able to apply Natural Language Processing to classify the headline of a news article as being fake or real?
Based on the model we have, how can we improve its accuracy and effectiveness?

Model Used

LTSM Model
Random Forest Tree Classifier

Stages of Analysis

Conclusions

The LSTM model is a good base for classifying headlines
Our model is good at classifying fake news but bad at classifying true news due to the nature of our data and the overlap in linguistic nature of some headlines.
There are improvements that can be made to our current model to improve accuracy that we have proposed in our presentation.

Skills Learnt

Collaboration using GitHub and Google Collab
How to clean text data
How to draw insights from text data
New evaluation metrics for Binary Classification Models
Understanding the working behind LSTM model
Understanding why our model is good at classifying fake news and why it is bad at classifying real news
Understanding the shortcomings in our train data and how to improve model accuracy

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Misclassifications		Misclassifications
Source 2		Source 2
ml_model		ml_model
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Cleaning.ipynb		Cleaning.ipynb
EDA.ipynb		EDA.ipynb
Fake.csv		Fake.csv
Model Evaluation.ipynb		Model Evaluation.ipynb
OnionOrNot - False Positives.txt		OnionOrNot - False Positives.txt
OnionOrNot.csv		OnionOrNot.csv
README.md		README.md
RandomForestClassifier.ipynb		RandomForestClassifier.ipynb
SentimentAnalysis.ipynb		SentimentAnalysis.ipynb
True.csv		True.csv
abcnews-date-text.csv		abcnews-date-text.csv
cleaned_Fake.csv		cleaned_Fake.csv
cleaned_True.csv		cleaned_True.csv
machineLearningLSTM.ipynb		machineLearningLSTM.ipynb
model.h5		model.h5
project.ipynb		project.ipynb
totalData.csv		totalData.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sc1015-mini-project

About

Contributors

Problem Definition

Model Used

Stages of Analysis

Conclusions

Skills Learnt

References

Datasets:

EDA

Evaluation Metrics:

About

Releases

Packages

Contributors 3

Languages

indicium15/sc1015-project

Folders and files

Latest commit

History

Repository files navigation

sc1015-mini-project

About

Contributors

Problem Definition

Model Used

Stages of Analysis

Conclusions

Skills Learnt

References

Datasets:

EDA

Evaluation Metrics:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages