Merge branch 'main' into release/2.5

modelscope · Oct 19, 2024 · 0e7d652 · 0e7d652
2 parents 86c9e6c + 39494b7
commit 0e7d652
Show file tree

Hide file tree

Showing 35 changed files with 1,064 additions and 73 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 2024.10.09: Support for reward modeling for LLM and MLLM, as well as PPO training for LLM. Refer to the [documentation](docs/source_en/LLM/Human-Preference-Alignment-Training-Documentation.md).
 - 2024.10.09: Support for training and deploying ovis1.6-gemma2 series models. Experience it using `swift infer --model_type ovis1_6-gemma2-9b`.
 - 2024.09.26: Support for training and deploying llama3.2-vision series models. Experience it using `swift infer --model_type llama3_2-11b-vision-instruct`.
 - 2024.09.26: Support for training and deploying llama3.2 series models. Experience it using `swift infer --model_type llama3_2-1b-instruct`.
@@ -634,7 +635,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 | Llava-HF                                                   | [Llava-HF series models](https://huggingface.co/llava-hf)                          | English       | 0.5B-110B           | chat model           |
 | Llava1.5<br>Llava1.6                                       | [Llava series models](https://github.com/haotian-liu/LLaVA)                            | English            | 7B-34B                                | chat model               |
 | Llava-Next<br>Llava-Next-Video                             | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT)                     | Chinese<br>English | 7B-110B                               | chat model               |
-| mPLUG-Owl2<br>mPLUG-Owl2.1<br>mPLUG-Owl3                | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl)                         | English            | 11B                                   | chat model               |
+| mPLUG-Owl2<br>mPLUG-Owl2.1<br>mPLUG-Owl3                | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl)                         | English            | 1B-11B                                   | chat model               |
 | InternVL<br>Mini-InternVL<br>InternVL2                     | [InternVL](https://github.com/OpenGVLab/InternVL)                                      | Chinese<br>English | 1B-40B<br>including quantized version | chat model               |
 | Llava-llama3                                               | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers)             | English            | 8B                                    | chat model               |
 | Phi3-Vision                                                | Microsoft                                                                              | English            | 4B                                    | chat model               |
@@ -644,7 +645,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 | Pixtral                                | [mistralai](https://huggingface.co/mistralai)                     | English       | 12B      | chat model       |
 | Llama3.1-Omni              | [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni)                | English       | 8B      | chat model       |
 | Ovis              | [Ovis](https://github.com/AIDC-AI/Ovis)                | English       | 9B      | chat model       |
-
+| Molmo              | [Molmo series models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)                | English       | 1B-72B      | chat model       |
 
 #### Diffusion Models
 
@@ -655,7 +656,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 
 ### Supported Open Source Datasets
 
-| Dataset Type        | Training Task   | Documentation                                                                                                                                                                                                                                |
+| Dataset Type        | Training Task   | Dataset                                                                                                                                                                                                                                |
 |---------------------|:----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | General             | Fine-tuning     | 🔥ruozhiba, 🔥ms-bench, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca, instinwild, cot-en, cot-zh, firefly-zh, instruct-en, gpt4all-en, sharegpt, tulu-v2-sft-mixture, wikipedia-zh, open-orca, sharegpt-gpt4, deepctrl-sft, coig-cqia. |
 | Agent               | Fine-tuning     | 🔥ms-agent, 🔥ms-agent-for-agentfabric, ms-agent-multirole, 🔥toolbench-for-alpha-umi, damo-agent-zh, damo-agent-zh-mini, agent-instruct-all-en.                                                                                             |

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 2024.10.09: 支持 llm 和 mllm 的 reward modeling 训练, 支持 llm 的 PPO 训练. 参考[文档](docs/source/LLM/人类偏好对齐训练文档.md)
 - 2024.10.09: 支持ovis1.6-gemma2的训练到部署. 使用`swift infer --model_type ovis1_6-gemma2-9b`进行体验.
 - 2024.09.26: 支持llama3.2-vision系列模型的训练到部署. 使用`swift infer --model_type llama3_2-11b-vision-instruct`进行体验.
 - 2024.09.26: 支持llama3.2系列模型的训练到部署. 使用`swift infer --model_type llama3_2-1b-instruct`进行体验.
@@ -627,7 +628,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | Llava-HF               | [Llava-HF系列模型](https://huggingface.co/llava-hf)                          | 英文       | 0.5B-110B           | chat模型           |
 | Llava1.5<br>Llava1.6                                    | [Llava系列模型](https://github.com/haotian-liu/LLaVA)                          | 英文       | 7B-34B           | chat模型           |
 | Llava-Next<br>Llava-Next-Video                          | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT)                   | 中文<br>英文 | 7B-110B          | chat模型           |
-| mPLUG-Owl2<br>mPLUG-Owl2.1<br>mPLUG-Owl3           | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl)                       | 英文       | 11B              | chat模型           |
+| mPLUG-Owl2<br>mPLUG-Owl2.1<br>mPLUG-Owl3           | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl)                       | 英文       | 1B-11B              | chat模型           |
 | InternVL<br>Mini-InternVL<br>InternVL2                  | [InternVL](https://github.com/OpenGVLab/InternVL)                          | 中文<br>英文 | 1B-40B<br>包含量化版本 | chat模型           |
 | Llava-llama3                                            | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | 英文       | 8B               | chat模型       |
 | Phi3-Vision                                             | 微软                                                                         | 英文       | 4B               | chat模型       |
@@ -637,6 +638,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | Pixtral                                | [mistralai](https://huggingface.co/mistralai)                               | 英文       | 12B      | chat模型       |
 | Llama3.1-Omni              | [LLaMA-Omni](https://github.com/ictnlp/LLaMA-Omni)                | 英文       | 8B      | chat模型       |
 | Ovis              | [Ovis](https://github.com/AIDC-AI/Ovis)                | 英文       | 9B      | chat模型       |
+| Molmo              | [Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)                | 英文       | 1B-72B      | chat模型       |
 
 
 #### 扩散模型
@@ -648,7 +650,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 
 ### 支持的开源数据集
 
-| 数据集类型 | 训练任务 | 文档                                                                                                                                                                                                                                           |
+| 数据集类型 | 训练任务 | 数据集                                                                                                                                                                                                                                          |
 |-------|:-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | 通用    | 微调   | 🔥ruozhiba, 🔥ms-bench, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca, instinwild, cot-en, cot-zh, firefly-zh, instruct-en, gpt4all-en, sharegpt, tulu-v2-sft-mixture, wikipedia-zh, open-orca, sharegpt-gpt4, deepctrl-sft, coig-cqia. |
 | Agent | 微调   | 🔥ms-agent, 🔥ms-agent-for-agentfabric, ms-agent-multirole, 🔥toolbench-for-alpha-umi, damo-agent-zh, damo-agent-zh-mini, agent-instruct-all-en.                                                                                             |

diff --git a/docs/source/Instruction/命令行参数.md b/docs/source/Instruction/命令行参数.md
@@ -281,15 +281,35 @@ RLHF参数继承了sft参数, 除此之外增加了以下参数:
 - `--🔥rlhf_type`: 选择对齐算法，可选项为'dpo', 'orpo', 'simpo', 'kto', 'cpo', 默认为`'dpo'`. 训练脚本请查看[文档](../LLM/人类偏好对齐训练文档.md)
 - `--ref_model_type`: 选择参考模型, 同model_type参数, 默认为`None`, 与训练模型一致。其中`cpo`, `simpo`, `orpo`算法无需选择。通常不需要设置。
 - `--ref_model_id_or_path`: 参考模型的本地cache路径, 默认为`None`.
+- `--ref_model_revision`: 参考模型版本, 同model_revision参数, 默认为`None`, 与训练模型一致。通常不需要设置。
 - `--beta`: KL正则项系数, 默认为`None`, 即`simpo`算法默认为`2.`, 其他算法默认为`0.1`. 具体参考[文档](../LLM/人类偏好对齐训练文档.md)
 - `--label_smoothing`: 是否使用DPO smoothing, 默认值为`0`，一般设置在0~0.5之间.
 - `--loss_type`: loss类型, 默认为`None`, 如果是dpo, cpo则为`sigmoid`, 如果是simpo则为`simpo`.
+
+### DPO参数
 - `--🔥rpo_alpha`: 控制DPO中加入sft_loss的权重, 默认为`1`. 最后的loss为`KL_loss + rpo_alpha * sft_loss`.
+
+### CPO/SimPO参数
 - `--cpo_alpha`: CPO/SimPO loss 中 nll loss的系数, 默认为`1.`.
 - `--simpo_gamma`: SimPO算法中的reward margin项，论文中建议设置为0.5-1.5, 默认为`1.`
+
+### KTO参数
 - `--desirable_weight`: KTO算法中对desirable response的loss权重 $\lambda_D$ ，默认为`1.`
 - `--undesirable_weight`: KTO论文中对undesirable response的loss权重 $\lambda_U$ , 默认为`1.`. 分别用$n_d$ 和$n_u$ 表示数据集中desirable examples和undesirable examples的数量，论文中推荐控制 $\frac{\lambda_D n_D}{\lambda_Un_U} \in [1,\frac{4}{3}]$
 
+### PPO参数
+- `--reward_model_id_or_path` : 奖励模型的本地cache路径, 需要包含value_head的权重(`value_head.safetensors`或 `value_head.bin`)
+- `--reward_model_type`: 奖励模型类型, 同model_type参数
+- `--reward_model_revision`: 奖励模型版本, 同model_revision参数
+- `--local_rollout_forward_batch_size`: 每次数据采样的批量大小, 默认为64
+- `--whiten_rewards`: 对奖励进行归一化处理, 默认为False
+- `--kl_coef`: KL散度项的系数, 默认为0.05
+- `--cliprange`: PPO策略损失函数中的clip范围, 默认为0.2
+- `--vf_coef`: 价值损失函数系数, 默认为0.1
+- `--cliprange_value`: PPO价值损失函数中的clip范围, 默认为0.2
+- `--gamma`: 累计奖励的折扣因子, 默认为1.0
+- `--lam`: [GAE](https://arxiv.org/abs/1506.02438)中的lambda系数, 默认为0.95
+
 ## infer merge-lora 参数
 
 - `--🔥model_type`: 默认值为`None`, 具体的参数介绍可以在`sft命令行参数`中查看.

diff --git a/docs/source/Instruction/常见问题整理.md b/docs/source/Instruction/常见问题整理.md
@@ -148,6 +148,33 @@ V100机器要用fp32训练qwen2。
 ### Q38: gptq量化模型，能全参数微调吗？
 不能。gptq模型的int型参数无法参与求导，只能附着lora等额外结构参与更新。
 
+### Q39: 请问如果想用qlora的方式微调的话应该如何设置参数呢?glm4-chat
+配置参数`--quantization_bit 4`，参考qlora[例子](https://github.com/modelscope/ms-swift/tree/main/examples/pytorch/llm/scripts/qwen_7b_chat)。
+
+### Q40: 用qwen2-vl-7b训练自己的数据集，总是出现"AdamW' object has no attribute 'train这个问题。
+`accelerate 0.34.0`试试。
+
+### Q41: 请教一个问题，我应该如何在swift框架下扩充我的词表呢？
+swift目前还不支持词表扩充。
+
+### Q42: 同名的模型是可以直接使用huggingface上的吗？
+设置环境变量`USE_HF=1`。
+
+### Q43: 请问Qwen2-VL-2B能进行增量预训练吗？有指导文件吗?有图文,也有纯文本的。
+支持了，如果继续预训练将所有的内容放在response中就可以了。
+
+### Q44: 请问下用视频做训练的时候，如何在参数中控制抽帧率，设了frame_rate设不起, minicpmv
+设置环境变量`MAX_NUM_FRAMES`。
+
+### Q45: swift在训练的时候，可以把验证集的推理结果保存下来吗？
+训练结束后，运行swift infer，会保存。
+
+### Q46: 我全量full参数dpo，为何保存的checkpoint 比原本模型文件要大呢?整整大了1倍
+用V100微调，存的是fp32类型。
+
+### Q47: 多机训练速度缓慢，在使用swift框架进行LLM训练时，发现采用deepspeed zero3训练会出现严重的速度下降问题
+详见[issue](https://github.com/modelscope/ms-swift/issues/1825)。
+
 ## 推理
 
 ### Q1:swift推理有文档吗？
@@ -184,6 +211,19 @@ ValueError: Input length of input_ids is 35, but `max_length` is set to 20. This
 ### Q10: 最新版本swift，我在加载qwen2-32b-instruct-awq 量化模型及其lora的时候，使用vllm 提示我加上merge lore ture,我加上就报错了，我去掉vllm 加速就能正常推理了，但是速度很慢
 qlora训练的模型不支持merge-lora的, 建议lora微调后 merge-lora再量化。
 
+### Q11: vllm会报错，assert factor in rope_scaling
+`pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830`，详见qwen2-vl[issue#96](https://github.com/QwenLM/Qwen2-VL/issues/96)。
+
+### Q12: vllm作为推理后端的话，模型必须合并以后才能调用吗？
+可以不合并，详见文档[VLLM推理加速与部署](https://swift.readthedocs.io/zh-cn/latest/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.html#id11)。
+
+### Q13: 推理返回prob只能用inference_client函数吗，单样本推理demo下的inference函数，能取出结果吗？
+`generation_config.output_logits`修改一下。`model.generation_config.output_logits = True`
+`model.generation_config.return_dict_in_generate = True`
+
+### Q14: 有人遇到过这个问题吗?RuntimeError: "triu_tril_cuda_template" not implemented for'BFloat16'
+升级torch,这个版本的torch没实现这个算子。
+
 ## 部署
 
 ### Q1: 如何部署训练后的模型？
@@ -210,6 +250,12 @@ base模型可以用client.chat.completions.create的，不过这个是兼容行
 ### Q8: lora 微调后进行了部署，使用swift的推理方式，报错requests.exceptions.HTTPError: Multimodal model only support `default-lora`
 这里`model_type`设置`default-lora`
 
+### Q9: swift推理服务启动后，交互进行设置的温度之类的配置，如何设置呢？
+推理只能启动前设置。部署可以在启动时设置默认，之后在客户端继续设置，覆盖默认。
+
+### Q10: 在本地部署qwen2vl模型，推理后端使用vllm，本地视频怎么传入呢？可以使用 base64 传进去吗？curl调用如何加载视频呢？
+可以查看[mllm部署文档](https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/MLLM%E9%83%A8%E7%BD%B2%E6%96%87%E6%A1%A3.html)。url、base64、本地路径都可以的, 本地路径只限于本机测试。
+
 ## 评测
 
 ### Q1: swift支持的评测集有哪些？
@@ -252,3 +298,12 @@ base模型可以用client.chat.completions.create的，不过这个是兼容行
 
 ### Q4: 官方支持的评测数据集手动下载后，swift eval能配置本地路径评测吗？
 先下载评测数据集[eval.zip](https://modelscope.cn/datasets/swift/evalscope_resource/files)，解压后将里面的内容放到 `~/.cache/modelscope/media_resources/evalscope/data`文件夹下；再执行swift eval命令就可以使用本地数据。
+
+### Q5: 自定义评测是不是有bug，把标准例子改成英文，一直都跑不通？
+```shell
+swift eval --model_type 'qwen2_5-1_5b-instruct' --eval_dataset no --custom_eval_config '/mnt/workspace/test_data/config_eval.json'
+```
+这是依赖了nltk的包，然后nltk的tokenizer需要下载一个punkt_tab的zip文件，国内有些环境下载不太稳定或者直接失败。已尝试改了代码做兜底，规避这个问题；参考[issue](https://github.com/nltk/nltk/issues/3293)。
+
+### Q6:  eval微调后的模型，总是会在固定的百分比停掉，但是vllm服务看着一直是有在正常运行的。模型越大，断开的越早。
+`TIMEOUT`环境变量设置为-1。