Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法复现simpo训练结果 #81

Open
dox012 opened this issue Jan 20, 2025 · 0 comments
Open

无法复现simpo训练结果 #81

dox012 opened this issue Jan 20, 2025 · 0 comments

Comments

@dox012
Copy link

dox012 commented Jan 20, 2025

我尝试复现simpo论文中的结果,使用提供的权重基于fastchat进行mt bench评测时结果与论文中大致相同(我使用gpt4o进行评测,gpt4o更苛刻一些分数比论文整体低一些,SimPO排名靠前,且所有的方法比baseline模型都要高一些),我用mistral-7b-instruct-simpo.yaml和llama-3-8b-instruct-simpo.yaml训练的结果与提供的权重差距较大(训练曲线看起来比较正常reward在升高,但mt bench得分比baseline模型低),用mistral-7b-base-simpo.yaml似乎可以比baseline取得提升(我按照另一个issue更改了模型id为zephyr),我使用8张80g显存显卡进行训练,实验将梯度累积减半保持batch size=128,其他设置和原始配置一样,请问这种情况下如何准确复现instruct版本的simpo模型结果?还有一个问题是:其他模型的权重是如何得到的能否提供训练脚本?

Image
图中simpo是下载的模型,simpo-1,simpo-2,simpo-3是我训练的模型
我的配置如下,我的机器无法联网因此将数据集和模型下载到本地进行训练
model_name_or_path: /mnt/username/cache/Mistral-7B-Instruct-v0.2
torch_dtype: null
attn_implementation: null #flash_attention_2

dataset_mixer:
/mnt/username/cache/mistral-instruct-ultrafeedback: 1.0

dataset_splits:

  • train
  • test
    preprocessing_num_workers: 12

bf16: true
beta: 2.5
gamma_beta_ratio: 0.1
do_eval: true
evaluation_strategy: steps
eval_steps: 400
gradient_accumulation_steps: 8 #16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
hub_model_id: simpo-exps
learning_rate: 5.0e-7
log_level: info
logging_steps: 5
lr_scheduler_type: cosine
max_length: 2048
max_prompt_length: 1800
num_train_epochs: 1
optim: adamw_torch
output_dir: outputs/mistral-7b-instruct-simpo
run_name: mistral-7b-instruct-simpo
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: false
save_strategy: "steps"
save_steps: 1000000
report_to:

  • tensorboard
    save_total_limit: 20
    seed: 42
    warmup_ratio: 0.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant