Skip to content

Latest commit

 

History

History
48 lines (43 loc) · 3.25 KB

README.md

File metadata and controls

48 lines (43 loc) · 3.25 KB

🐟 PhishNet

PhishNet Art

DISCLAIMER: The content provided by PhishNet is exclusively for educational and research purposes ONLY. The training data for our GPT-2 derived model has been carefully cleaned to remove any private or personally identifiable information (PII) to ensure ethical compliance and privacy. The views and opinions expressed are solely those of the authors and do not reflect any associated organizations. No warranty is provided regarding the accuracy or reliability of the information. Usage of PhishNet and its outputs is at your own risk, with no liability for any resultant damages. This project does not endorse illegal activities and should be used responsibly.

TL;DR

PhishNet is a research project utilizing Reinforced Self-Training (ReST) and fine-tuned GPT-2 to create a high-quality synthetic dataset of phishing emails. Trained on various valuable email datasets (see citations), this project aims to dive into the exploration of adversarial AI and expand our understanding of AI safety.

Citations

  • Radford, A., Wu, J., Child, R., et al. (2019). Language Models are Unsupervised Multitask Learners. Link
@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}
  • Gulcehre, C., Le Paine, T., Srinivasan, S., et al. (2023). Reinforced Self-Training (ReST) for Language Modeling. arXiv preprint arXiv:2308.08998. Link
@misc{gulcehre2023reinforced,
      title={Reinforced Self-Training (ReST) for Language Modeling}, 
      author={Caglar Gulcehre and Tom Le Paine and Srivatsan Srinivasan and Ksenia Konyushkova and Lotte Weerts and Abhishek Sharma and Aditya Siddhant and Alex Ahern and Miaosen Wang and Chenjie Gu and Wolfgang Macherey and Arnaud Doucet and Orhan Firat and Nando de Freitas},
      year={2023},
      eprint={2308.08998},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
  • The Enron Email Dataset. Carnegie Mellon University. Link.
  • The Enron Email Dataset. Kaggle. Link
  • Fraudulent Email Corpus. Kaggle. Link
  • Spam Mails Database. Kaggle. Link
  • Phishing Email Detection. Kaggle. Link
  • Customer Support Ticket Dataset. Kaggle Link
  • Spam or Not Spam Dataset. Kaggle Link

Table Of Content