ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Fengqing Jiang^1,* , Zhangchen Xu^1,* , Luyao Niu^1,* ,
Bill Yuchen Lin² , Radha Poovendran¹

¹University of Washington ²Allen Institute for AI
^*Equal Contribution

Warning: This project contains model outputs that may be considered offensive

[arXiv]

Overview

Usage

Setup Environment

bash build_env.sh chatbug

Run with Chatbug

python chatbug.py

You can set up the attack.yaml or run with cmd args to config the experiments.

Citation

If you find our project useful in your research, please consider citing:

@misc{jiang2024chatbug,
      title={ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates}, 
      author={Fengqing Jiang and Zhangchen Xu and Luyao Niu and Bill Yuchen Lin and Radha Poovendran},
      year={2024},
      eprint={2406.12935},
      archivePrefix={arXiv}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Overview

Usage

Setup Environment

Run with Chatbug

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Overview

Usage

Setup Environment

Run with Chatbug

Citation