This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

数据集和LLM背景知识相关汇总

Jump to bottom

codingma edited this page Aug 3, 2023 · 3 revisions

如何制作自己的RLHF数据集

参考 https://github.com/hiyouga/ChatGLM-Efficient-Tuning/issues/372

如何从大文档中制作自己的SFT数据集

参考 https://github.com/hiyouga/ChatGLM-Efficient-Tuning/issues/376

SFT, RM, PPO 几个阶段的关系是什么

可以参考 https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat 做进一步的学习主要关系是SFT是指令学习，而RM和PPO合在一起用于RLHF的对齐，先做SFT，再做RM，最后做PPO