无法复现simpo训练结果 #81

dox012 · 2025-01-20T15:59:59Z

我尝试复现simpo论文中的结果，使用提供的权重基于fastchat进行mt bench评测时结果与论文中大致相同（我使用gpt4o进行评测，gpt4o更苛刻一些分数比论文整体低一些，SimPO排名靠前，且所有的方法比baseline模型都要高一些），我用mistral-7b-instruct-simpo.yaml和llama-3-8b-instruct-simpo.yaml训练的结果与提供的权重差距较大（训练曲线看起来比较正常reward在升高，但mt bench得分比baseline模型低），用mistral-7b-base-simpo.yaml似乎可以比baseline取得提升（我按照另一个issue更改了模型id为zephyr），我使用8张80g显存显卡进行训练，实验将梯度累积减半保持batch size=128，其他设置和原始配置一样，请问这种情况下如何准确复现instruct版本的simpo模型结果？还有一个问题是：其他模型的权重是如何得到的能否提供训练脚本？

图中simpo是下载的模型，simpo-1，simpo-2，simpo-3是我训练的模型
我的配置如下，我的机器无法联网因此将数据集和模型下载到本地进行训练
model_name_or_path: /mnt/username/cache/Mistral-7B-Instruct-v0.2
torch_dtype: null
attn_implementation: null #flash_attention_2

dataset_mixer:
/mnt/username/cache/mistral-instruct-ultrafeedback: 1.0

dataset_splits:

train
test
preprocessing_num_workers: 12

bf16: true
beta: 2.5
gamma_beta_ratio: 0.1
do_eval: true
evaluation_strategy: steps
eval_steps: 400
gradient_accumulation_steps: 8 #16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
hub_model_id: simpo-exps
learning_rate: 5.0e-7
log_level: info
logging_steps: 5
lr_scheduler_type: cosine
max_length: 2048
max_prompt_length: 1800
num_train_epochs: 1
optim: adamw_torch
output_dir: outputs/mistral-7b-instruct-simpo
run_name: mistral-7b-instruct-simpo
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: false
save_strategy: "steps"
save_steps: 1000000
report_to:

tensorboard
save_total_limit: 20
seed: 42
warmup_ratio: 0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

无法复现simpo训练结果 #81

无法复现simpo训练结果 #81

dox012 commented Jan 20, 2025 •

edited

Loading

无法复现simpo训练结果 #81

无法复现simpo训练结果 #81

Comments

dox012 commented Jan 20, 2025 • edited Loading

dox012 commented Jan 20, 2025 •

edited

Loading