We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
无
作者您好,想请教一些PPO微调对话任务的问题
数据格式上,需要把一个session拆成多条数据吗(类似DPO每次只优化最后一个turn,需要把五轮的对话拆成有1,2,3,4,5轮数据的五个data),还是一个session放一个完整历史就可以(类似SFT,全都优化,就只构造一个五轮的数据)
The text was updated successfully, but these errors were encountered:
需要拆
Sorry, something went wrong.
您好!可以看下您的对话任务的ppo数据格式吗?我一直报错,困扰了很久,十分感谢。我原使用的sft格式,结果不尽人意,十分感谢! @cdhx
No branches or pull requests
Reminder
System Info
无
Reproduction
无
Others
作者您好,想请教一些PPO微调对话任务的问题
数据格式上,需要把一个session拆成多条数据吗(类似DPO每次只优化最后一个turn,需要把五轮的对话拆成有1,2,3,4,5轮数据的五个data),还是一个session放一个完整历史就可以(类似SFT,全都优化,就只构造一个五轮的数据)
The text was updated successfully, but these errors were encountered: