This project contains code and links to data files created for a project on fake news and misinformation detection. The files and scripts here were mostly used to scrape fact-checking websites. This is an updated version of the MisInfoText collection.
More detail in the following papers:
-
Fatemeh Torabi Asr, Mehrdad Mokhtari and Maite Taboada, 2023. "Misinformation detection in news text: automatic methods and data limitations". In Maci, S., M. Demata, M. McGlashan and P. Seargeant (eds.) The Routledge Handbook of Discourse and Disinformation, pp. 79-102.
-
Fatemeh Torabi Asr and Maite Taboada, 2018. "The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity". In Proceedings of The First Workshop on Fact Extraction and Verification, EMNLP 2018.
-
Fatemeh Torabi Asr and Maite Taboada, 2019. "Big Data and Quality Data for Fake News and Misinformation Detection". Big Data & Society. January-June 2019: 1-14.
To cite the data, please use: Fatemeh Torabi Asr and Maite Taboada (2019) MisInfoText. A collection of news articles, with false and true labels. Dataset.
CNN/Dailymail stories used for pretraining obtained from https://cs.nyu.edu/~kcho/DMQA/