Official Repository for the ACL 2024 Paper: Unintended Impacts of LLM Alignment on Global Representation
Figure 1: Country rewards for Starling 7B Reward Model prompted with "User: Where are you from? Assistant: I am from {country}." Starling assigns higher rewards to English-speaking Western nations and lower rewards to countries in the Middle East/Africa.
This repository contains all the code for the ACL 2024 Paper Unintended Impacts of LLM Alignment on Global Representation. If you are looking for the AskRedditCountries dataset check out our huggingface.
This repository covers all the steps to reproduce the results in our paper exactly. We also include all the intermediate/final results in the /outputs/
, /results/
, and /visualization/
folders.
If you want to reproduce all experiments and plots in our paper, first download the md3 dataset following the instructions in /data/md3/md3/README.txt
, here. Then run the following bash script:
./scripts/run_all.sh
conda create -n "alignment-impacts" python=3.11.5 ipython
conda activate alignment-impacts
pip install -r requirements.txt
To run all experiments run the following script
./scripts/experiments/experiments.sh
Otherwise you can run the specific scripts below to reproduce specific experiments
Run the "Where From" script
./scripts/experiments/0-where_from_reward_model.sh
First download the md3 dataset following the instructions in /data/md3/md3/README.txt
, here.
Next run the data cleaning script
./scripts/experiments/1-md3_clean.sh
Now you are set to run the md3 experiment script
./scripts/experiments/2-md3_experiments.sh
This will write the outputs to ./outputs/md3-game/
.
Run the Belebele Reading Comprehension script
./scripts/experiments/3-belebele_experiments.sh
Run the TyDiQA Question Answering script
./scripts/experiments/4-tydiqa_experiments.sh
Run the Language ID script
./scripts/experiments/5-langid_experiments.sh
Run the Global Opinions QA script
./scripts/experiments/6-globalopinions_experiments.sh
Run the Ask Reddit Country Opinions Reward Modeling script
./scripts/experiments/7-askreddit-rewards.sh
Run the Ask Reddit Country Opinions Language Model perplexities script
./scripts/experiments/8-askreddit-perplexities.sh
Run the postprocessing script
./scripts/postprocessing/9-postprocessing.sh
This will take the outputs from ./outputs/
and process them into single csv files in the ./results/
directory
To run all analysis run the following script
./scripts/analysis/analysis.sh
Otherwise you can run the following scripts to reproduce specific plots
Run the "Where From" analysis script
./scripts/analysis/10-where_from_chloropleth.sh
Run the md3 analysis script
./scripts/analysis/11-md3_game_analysis.sh
Run the belebele analysis script
./scripts/analysis/12-belebele_analysis.sh
Run the tydiqa analysis script
./scripts/analysis/13-tydiqa_analysis.sh
Run the langid script for Tulu SFT and ultrachat
./scripts/analysis/14-langid.sh
Run the Global Opinions QA analysis script
./scripts/analysis/15-global-opinions.sh
Produce the chloropleth for the reward model giving country opinions on the full AskReddit dataset
./scripts/analysis/16-ask_reddit_chloropleth.sh
Produce the tables and plots for the reward model, language model, and US citizen correlations
./scripts/analysis/17-ask_reddit_correlation.sh
Michael Ryan: Scholar | Twitter | Github | LinkedIn | Research Gate | Personal Website | [email protected]
If you use this code or our AskRedditCountries dataset please cite our paper:
@misc{ryan2024unintended,
title={Unintended Impacts of LLM Alignment on Global Representation},
author={Michael J. Ryan and William Held and Diyi Yang},
year={2024},
eprint={2402.15018},
archivePrefix={arXiv},
primaryClass={cs.CL}
}