Materials and paper list for the SBP-BRiMS 2024 tutorial: "Defending Against Generative AI Threats in NLP".
Tutorial authors and organizers:
- Amrita Bhattacharjee, Arizona State University
- Raha Moraffah, Worcester Polytechnic Institute
- Christopher Parisien, NVIDIA
- Huan Liu, Arizona State University
❗ (Will be periodically updated. Please star or watch this repo to get notified of updates!)
- Overview paper: Generative AI [paper link]
- Generative AI vs. Discriminative AI [blogpost link]
- A Comprehensive Overview of Large Language Models [paper link]
- Examples of AI Image Generators [list + blogpost]
- Generative AI model for Audio and Music: text-to-music and text-to-audio via [AudioCraft by Meta AI], text-to-symbolic-music via [MuseCoco by Microsoft]
- Examples of AI Video Generators [list + blogpost]
- Examples of open-source AI Video Generators: [CogVideo] , [Text2Video-Zero] , [Open-Sora]
- Class-Based n-gram Models of Natural Language [paper link]
- n-gram models [lecture notes]
- LSTM paper [paper link]
- LSTM Neural Networks for Language Modeling [paper link]
- Attention Is All You Need [paper link]
- An Overview and History of Large Language Models [blogpost link]
- Cross-Task Generalization via Natural Language Crowdsourcing Instructions [paper link]
- Fine-tuned Language Models are Zero-shot Learners [paper link]
- Training language models to follow instructions with human feedback [paper link]
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model [paper link]
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [paper link]
- MMLU: Measuring Massive Multitask Language Understanding [paper link]
- MMLU Learderboard [leaderboard]
- LM Sys Chatbot Arena Leaderboard on Huggingface [paper link]
- Cool Github Repo with LLM Evaluation Benchmark resources [github repo]
- Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [paper link]
- Universal and Transferable Adversarial Attacks on Aligned Language Models [paper link]
- ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs [paper link]
- Stealing Part of a Production Language Model [paper link]
- Bing Chat: Data Exfiltration Exploit Explained [blogpost link]
- FakeToxicityPrompts: Automatic Red Teaming [blogpost link]
- Defending Against Social Engineering Attacks in the Age of LLMs [paper link]
- Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges [paper link]
- Can LLM-generated Misinformation be Detected? [paper link]
- The Looming Threat of Fake and LLM-generated LinkedIn Profiles: Challenges and Opportunities for Detection and Prevention [paper link]
- garak: LLM vulnerability scanner
- NVIDIA NeMo Guardrails
- AEGIS from NVIDIA
- AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts [paper]
- AEGIS Dataset [dataset on higgingface]
- AEGIS Defensive model [model on huggingface]
- AEGIS Permissive model [model on huggingface]
- Improving Neural Language Modeling via Adversarial Training [paper link]
- Certifying LLM Safety against Adversarial Prompting [paper link]
- Towards Improving Adversarial Training of NLP Models [paper link]
- Adversarial Training for Large Neural Language Models [paper link]
- Adversarial Text Purification: A Large Language Model Approach for Defense [paper link]
- DeTox: Toxic Subspace Projection for Model Editing [paper link]
- Editing Models with Task Arithmetic [paper link]
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks [paper link]
- Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing [paper link]
- Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing [paper link]
- Activation Addition: Steering Language Models Without Optimization [paper link]
- Steering Llama 2 via Contrastive Activation Addition [paper link]
- Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models [paper link]
- What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [paper link]
- RAIN: Your Language Models Can Align Themselves without Finetuning [paper link]
- Parameter-Efficient Detoxification with Contrastive Decoding [paper link]
- Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates [paper link]
For any questions, feedback, comments, or just to say hi, contact Amrita at [email protected].