Noise Contrastive Alignment of Language Models with Explicit Rewards

This repo contains training scripts used in

Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen, Guande He, Lifan, Yuan, Ganqu Cui, Hang Su, and Jun Zhu
Tsinghua

We enable aligning a pretrained language model with datasets annotated by explicit rewards instead of just binary preference by introducing Noise Contrastive Alignment (Figure 1). This framework includes two general algorithms (NCA and InfoNCA) that can deal with both preference data and reward data. Notably, we find that InfoNCA incorporates DPO loss as a special case in binary preference settings. Compared with DPO/InfoNCA, the main advantage of NCA is that it effectively prevents the chosen likelihood from decreasing, a phenomenon commonly observed when applying DPO/InfoNCA loss (Figure 2).

In this repo, we release:

The training scripts of NCA/InfoNCA for aligning Mistral-7B model using UltraFeedback Dataset.
Pretrained model weights.

Update

[2024.09] Paper gets accepted at NeurIPS 2024.
[2024.06] Dataset and training code are released.
[2024.05] The pairwise preference version of NCA has now been supported by trl library.
[2024.04] NCA algorithm helps empower Eurus-70B and Eurus-8*7B model, demonstrating significant advantages in complex reasoning tasks compared to the DPO algorithm. Eurus-70B outperformed GPT-3.5-Turbo in a comprehensive benchmark across 12 tests covering five different tasks.
[2024.03] Pretrained model weights are released.

Getting Started

Set up environments

cd alignment-handbook; pip install -e .

and

cd trl; pip install -e .

Train

Before running, please determine your available training device numbers and change gradient_accumulation_steps for an appropriate global batch size. We use 8*A40 GPUs and a global batch size of 32 by default.

For aligning with reward datasets, run

NCCL_P2P_DISABLE=1 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file alignment-handbook/recipes/accelerate_configs/multi_gpu.yaml --num_processes=8 --main_process_port=7000 run_reward.py yamls/reward_qlora.yaml --gradient_accumulation_steps=4 --beta=0.01 --loss_type=[NCA/InfoNCA] --output_dir=data/test_run

For aligning with preference datasets (e.g., Binarized UltraFeedback), run

NCCL_P2P_DISABLE=1 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file alignment-handbook/recipes/accelerate_configs/multi_gpu.yaml --num_processes=8 --main_process_port=7000 run_preference.py yamls/preference_qlora.yaml --gradient_accumulation_steps=4 --beta=0.01 --loss_type=[NCA/DPO] --output_dir=data/test_run

Evaluation

Check out alignment-handbook instructions for evaluating models on MT-bench and AlpacaEval.

License

MIT

BibTeX

@article{chen2024noise,
  title={Noise contrastive alignment of language models with explicit rewards},
  author={Chen, Huayu and He, Guande and Yuan, Lifan and Cui, Ganqu and Su, Hang and Zhu, Jun},
  journal={arXiv preprint arXiv:2402.05369},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
alignment-handbook @ 43f5222		alignment-handbook @ 43f5222
trl-lib @ 2f726ce		trl-lib @ 2f726ce
yamls		yamls
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
configs.py		configs.py
data_utils.py		data_utils.py
main.png		main.png
preference_trainer.py		preference_trainer.py
reward_trainer.py		reward_trainer.py
run_preference.py		run_preference.py
run_reward.py		run_reward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Noise Contrastive Alignment of Language Models with Explicit Rewards

Update

Getting Started

Set up environments

Train

Evaluation

License

BibTeX

About

Releases

Packages

Languages

License

thu-ml/Noise-Contrastive-Alignment

Folders and files

Latest commit

History

Repository files navigation

Noise Contrastive Alignment of Language Models with Explicit Rewards

Update

Getting Started

Set up environments

Train

Evaluation

License

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages