Skip to content

Latest commit

 

History

History
53 lines (43 loc) · 2.03 KB

README.md

File metadata and controls

53 lines (43 loc) · 2.03 KB

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Fengqing Jiang1,* ,  Zhangchen Xu1,* ,  Luyao Niu1,* , 
Bill Yuchen Lin2 ,  Radha Poovendran1  

1University of Washington   2Allen Institute for AI   
*Equal Contribution

Warning: This project contains model outputs that may be considered offensive

[arXiv]

Overview

Usage

Setup Environment

bash build_env.sh chatbug

Run with Chatbug

python chatbug.py

You can set up the attack.yaml or run with cmd args to config the experiments.

Citation

If you find our project useful in your research, please consider citing:

@misc{jiang2024chatbug,
      title={ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates}, 
      author={Fengqing Jiang and Zhangchen Xu and Luyao Niu and Bill Yuchen Lin and Radha Poovendran},
      year={2024},
      eprint={2406.12935},
      archivePrefix={arXiv}
}