- Apache Spark 2.3.0
- Jupyter Notebook
Datasets used in this project is manually obtained from the following sources:
- Phishtank - https://www.phishtank.com/developer_info.php
- Open Phis - https://openphish.com/
- JWSPAMSPY - http://www.joewein.de/sw/blacklist.htm
- DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66
- https://www.malwarepatrol.net/my-account/
- http://www.malwaredomainlist.com/
The Dataset.csv used in this project is the combination of the above sources. A data pre-processing program is used to clean and filter the data. Thus, the dataset is already being labelled and ready to be used in the project.