🐟 PhishNet

DISCLAIMER: The content provided by PhishNet is exclusively for educational and research purposes ONLY. The training data for our GPT-2 derived model has been carefully cleaned to remove any private or personally identifiable information (PII) to ensure ethical compliance and privacy. The views and opinions expressed are solely those of the authors and do not reflect any associated organizations. No warranty is provided regarding the accuracy or reliability of the information. Usage of PhishNet and its outputs is at your own risk, with no liability for any resultant damages. This project does not endorse illegal activities and should be used responsibly.

TL;DR

PhishNet is a research project utilizing Reinforced Self-Training (ReST) and fine-tuned GPT-2 to create a high-quality synthetic dataset of phishing emails. Trained on various valuable email datasets (see citations), this project aims to dive into the exploration of adversarial AI and expand our understanding of AI safety.

Citations

Radford, A., Wu, J., Child, R., et al. (2019). Language Models are Unsupervised Multitask Learners. Link

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Gulcehre, C., Le Paine, T., Srinivasan, S., et al. (2023). Reinforced Self-Training (ReST) for Language Modeling. arXiv preprint arXiv:2308.08998. Link

@misc{gulcehre2023reinforced,
      title={Reinforced Self-Training (ReST) for Language Modeling}, 
      author={Caglar Gulcehre and Tom Le Paine and Srivatsan Srinivasan and Ksenia Konyushkova and Lotte Weerts and Abhishek Sharma and Aditya Siddhant and Alex Ahern and Miaosen Wang and Chenjie Gu and Wolfgang Macherey and Arnaud Doucet and Orhan Firat and Nando de Freitas},
      year={2023},
      eprint={2308.08998},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

The Enron Email Dataset. Carnegie Mellon University. Link.
The Enron Email Dataset. Kaggle. Link
Fraudulent Email Corpus. Kaggle. Link
Spam Mails Database. Kaggle. Link
Phishing Email Detection. Kaggle. Link
Customer Support Ticket Dataset. Kaggle Link
Spam or Not Spam Dataset. Kaggle Link

Table Of Content

Getting Started
- Installation
- Usage
Methodology
Results and Evaluation
Contributing
License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🐟 PhishNet

TL;DR

Citations

Table Of Content

Files

README.md

Latest commit

History

README.md

File metadata and controls

🐟 PhishNet

TL;DR

Citations

Table Of Content