Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO数据格式问题 #6538

Closed
1 task done
cdhx opened this issue Jan 6, 2025 · 2 comments
Closed
1 task done

PPO数据格式问题 #6538

cdhx opened this issue Jan 6, 2025 · 2 comments
Labels
solved This problem has been already solved

Comments

@cdhx
Copy link

cdhx commented Jan 6, 2025

Reminder

  • I have read the README and searched the existing issues.

System Info

Reproduction

Others

作者您好,想请教一些PPO微调对话任务的问题

数据格式上,需要把一个session拆成多条数据吗(类似DPO每次只优化最后一个turn,需要把五轮的对话拆成有1,2,3,4,5轮数据的五个data),还是一个session放一个完整历史就可以(类似SFT,全都优化,就只构造一个五轮的数据)

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 6, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 6, 2025

需要拆

@hiyouga hiyouga closed this as completed Jan 6, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 6, 2025
@zhangguoxin1
Copy link

您好!可以看下您的对话任务的ppo数据格式吗?我一直报错,困扰了很久,十分感谢。我原使用的sft格式,结果不尽人意,十分感谢! @cdhx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants