Skip to content

Materials and paper list for the SBP-BRiMS 2024 tutorial: "Defending Against Generative AI Threats in NLP"

License

Notifications You must be signed in to change notification settings

AmritaBh/sbp24-llm-attack-defense-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Tutorial Materials: Defending Against Generative AI Threats in NLP

Materials and paper list for the SBP-BRiMS 2024 tutorial: "Defending Against Generative AI Threats in NLP".

Tutorial authors and organizers:

  1. Amrita Bhattacharjee, Arizona State University
  2. Raha Moraffah, Worcester Polytechnic Institute
  3. Christopher Parisien, NVIDIA
  4. Huan Liu, Arizona State University

Paper and Resource List

❗ (Will be periodically updated. Please star or watch this repo to get notified of updates!)

📚 Introduction to Generative AI

  1. Overview paper: Generative AI [paper link]
  2. Generative AI vs. Discriminative AI [blogpost link]
  3. A Comprehensive Overview of Large Language Models [paper link]
  4. Examples of AI Image Generators [list + blogpost]
  5. Generative AI model for Audio and Music: text-to-music and text-to-audio via [AudioCraft by Meta AI], text-to-symbolic-music via [MuseCoco by Microsoft]
  6. Examples of AI Video Generators [list + blogpost]
  7. Examples of open-source AI Video Generators: [CogVideo] , [Text2Video-Zero] , [Open-Sora]

📚 Language Modeling and LLMs

History

  1. Class-Based n-gram Models of Natural Language [paper link]
  2. n-gram models [lecture notes]
  3. LSTM paper [paper link]
  4. LSTM Neural Networks for Language Modeling [paper link]
  5. Attention Is All You Need [paper link]
  6. An Overview and History of Large Language Models [blogpost link]

Training-related Steps

  1. Cross-Task Generalization via Natural Language Crowdsourcing Instructions [paper link]
  2. Fine-tuned Language Models are Zero-shot Learners [paper link]
  3. Training language models to follow instructions with human feedback [paper link]
  4. Direct Preference Optimization: Your Language Model is Secretly a Reward Model [paper link]

Evaluation

  1. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [paper link]
  2. MMLU: Measuring Massive Multitask Language Understanding [paper link]
  3. MMLU Learderboard [leaderboard]
  4. LM Sys Chatbot Arena Leaderboard on Huggingface [paper link]
  5. Cool Github Repo with LLM Evaluation Benchmark resources [github repo]

📚 LLM Threats

Part 1: Attacks on LLMs

  1. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [paper link]
  2. Universal and Transferable Adversarial Attacks on Aligned Language Models [paper link]
  3. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs [paper link]
  4. Stealing Part of a Production Language Model [paper link]
  5. Bing Chat: Data Exfiltration Exploit Explained [blogpost link]
  6. FakeToxicityPrompts: Automatic Red Teaming [blogpost link]

Part 2: Misuse of LLMs

  1. Defending Against Social Engineering Attacks in the Age of LLMs [paper link]
  2. Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges [paper link]
  3. Can LLM-generated Misinformation be Detected? [paper link]
  4. The Looming Threat of Fake and LLM-generated LinkedIn Profiles: Challenges and Opportunities for Detection and Prevention [paper link]

📚 LLM Defenses and Safety

Tools

  1. garak: LLM vulnerability scanner
  1. NVIDIA NeMo Guardrails
  1. AEGIS from NVIDIA

Standard Defenses

  1. Improving Neural Language Modeling via Adversarial Training [paper link]
  2. Certifying LLM Safety against Adversarial Prompting [paper link]
  3. Towards Improving Adversarial Training of NLP Models [paper link]
  4. Adversarial Training for Large Neural Language Models [paper link]
  5. Adversarial Text Purification: A Large Language Model Approach for Defense [paper link]

Model Editing and Parameter-efficent methods

  1. DeTox: Toxic Subspace Projection for Model Editing [paper link]
  2. Editing Models with Task Arithmetic [paper link]
  3. Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks [paper link]
  4. Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing [paper link]
  5. Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing [paper link]
  6. Activation Addition: Steering Language Models Without Optimization [paper link]
  7. Steering Llama 2 via Contrastive Activation Addition [paper link]
  8. Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models [paper link]
  9. What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [paper link]

Decoding-time methods

  1. RAIN: Your Language Models Can Align Themselves without Finetuning [paper link]
  2. Parameter-Efficient Detoxification with Contrastive Decoding [paper link]
  3. Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates [paper link]

Contact

For any questions, feedback, comments, or just to say hi, contact Amrita at [email protected].

About

Materials and paper list for the SBP-BRiMS 2024 tutorial: "Defending Against Generative AI Threats in NLP"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published