We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我尝试复现simpo论文中的结果,使用提供的权重基于fastchat进行mt bench评测时结果与论文中大致相同(我使用gpt4o进行评测,gpt4o更苛刻一些分数比论文整体低一些,SimPO排名靠前,且所有的方法比baseline模型都要高一些),我用mistral-7b-instruct-simpo.yaml和llama-3-8b-instruct-simpo.yaml训练的结果与提供的权重差距较大(训练曲线看起来比较正常reward在升高,但mt bench得分比baseline模型低),用mistral-7b-base-simpo.yaml似乎可以比baseline取得提升(我按照另一个issue更改了模型id为zephyr),我使用8张80g显存显卡进行训练,实验将梯度累积减半保持batch size=128,其他设置和原始配置一样,请问这种情况下如何准确复现instruct版本的simpo模型结果?还有一个问题是:其他模型的权重是如何得到的能否提供训练脚本?
图中simpo是下载的模型,simpo-1,simpo-2,simpo-3是我训练的模型 我的配置如下,我的机器无法联网因此将数据集和模型下载到本地进行训练 model_name_or_path: /mnt/username/cache/Mistral-7B-Instruct-v0.2 torch_dtype: null attn_implementation: null #flash_attention_2
dataset_mixer: /mnt/username/cache/mistral-instruct-ultrafeedback: 1.0
dataset_splits:
bf16: true beta: 2.5 gamma_beta_ratio: 0.1 do_eval: true evaluation_strategy: steps eval_steps: 400 gradient_accumulation_steps: 8 #16 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: False hub_model_id: simpo-exps learning_rate: 5.0e-7 log_level: info logging_steps: 5 lr_scheduler_type: cosine max_length: 2048 max_prompt_length: 1800 num_train_epochs: 1 optim: adamw_torch output_dir: outputs/mistral-7b-instruct-simpo run_name: mistral-7b-instruct-simpo per_device_train_batch_size: 2 per_device_eval_batch_size: 4 push_to_hub: false save_strategy: "steps" save_steps: 1000000 report_to:
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我尝试复现simpo论文中的结果,使用提供的权重基于fastchat进行mt bench评测时结果与论文中大致相同(我使用gpt4o进行评测,gpt4o更苛刻一些分数比论文整体低一些,SimPO排名靠前,且所有的方法比baseline模型都要高一些),我用mistral-7b-instruct-simpo.yaml和llama-3-8b-instruct-simpo.yaml训练的结果与提供的权重差距较大(训练曲线看起来比较正常reward在升高,但mt bench得分比baseline模型低),用mistral-7b-base-simpo.yaml似乎可以比baseline取得提升(我按照另一个issue更改了模型id为zephyr),我使用8张80g显存显卡进行训练,实验将梯度累积减半保持batch size=128,其他设置和原始配置一样,请问这种情况下如何准确复现instruct版本的simpo模型结果?还有一个问题是:其他模型的权重是如何得到的能否提供训练脚本?
图中simpo是下载的模型,simpo-1,simpo-2,simpo-3是我训练的模型
我的配置如下,我的机器无法联网因此将数据集和模型下载到本地进行训练
model_name_or_path: /mnt/username/cache/Mistral-7B-Instruct-v0.2
torch_dtype: null
attn_implementation: null #flash_attention_2
dataset_mixer:
/mnt/username/cache/mistral-instruct-ultrafeedback: 1.0
dataset_splits:
preprocessing_num_workers: 12
bf16: true
beta: 2.5
gamma_beta_ratio: 0.1
do_eval: true
evaluation_strategy: steps
eval_steps: 400
gradient_accumulation_steps: 8 #16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
hub_model_id: simpo-exps
learning_rate: 5.0e-7
log_level: info
logging_steps: 5
lr_scheduler_type: cosine
max_length: 2048
max_prompt_length: 1800
num_train_epochs: 1
optim: adamw_torch
output_dir: outputs/mistral-7b-instruct-simpo
run_name: mistral-7b-instruct-simpo
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: false
save_strategy: "steps"
save_steps: 1000000
report_to:
save_total_limit: 20
seed: 42
warmup_ratio: 0.1
The text was updated successfully, but these errors were encountered: