Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用给的示示例数据tools的数据微调,后面自动多了一个Tools:None,数据处理报异常 #576

Open
1 of 2 tasks
mudeguo opened this issue Oct 6, 2024 · 2 comments

Comments

@mudeguo
Copy link

mudeguo commented Oct 6, 2024

System Info / 系統信息

python3.12,transformer 43;gpu 2080Ti22g*2

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

batchjed_conv: 50
conv: [{'role': 'system', 'content': '', 'tools': [{'type': 'function', 'function': {'name': 'get_recommended_books', 'description': "Get recommended books based on user's interests1", 'parameters': {'type': 'object', 'properties': {'interests': {'type': 'array', 'items': {'type': 'string'}, 'description': 'The interests to recommend books for'}}, 'required': ['interests']}}}]}, {'role': 'user', 'content': 'Hi, I am looking for some book recommendations. I am interested in history and science fiction.', 'tools': None}, {'role': 'assistant', 'content': '{"name": "get_recommended_books", "arguments": {"interests": ["history", "science fiction"]}}', 'tools': None}, {'role': 'observation', 'content': '{"books": ["Sapiens: A Brief History of Humankind by Yuval Noah Harari", "A Brief History of Time by Stephen Hawking", "Dune by Frank Herbert", "The Martian by Andy Weir"]}', 'tools': None}, {'role': 'assistant', 'content': 'Based on your interests in history and science fiction, I would recommend the following books: "Sapiens: A Brief History of Humankind" by Yuval Noah Harari, "A Brief History of Time" by Stephen Hawking, "Dune" by Frank Herbert, and "The Martian" by Andy Weir.', 'tools': None}]
Map: 0%| | 0/50 [00:00<?, ? examples/s]
[rank1]: ╭────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────────────────────╮
[rank1]: │ /data/projects/GLM-4/finetune_demo/finetune.py:419 in main │
[rank1]: │ │
[rank1]: │ 416 │ tokenizer, model = load_tokenizer_and_model(model_dir, peft_config=ft_config.peft_co │
[rank1]: │ 417 │ data_manager = DataManager(data_dir, ft_config.data_config) │
[rank1]: │ 418 │ │
[rank1]: │ ❱ 419 │ train_dataset = data_manager.get_dataset( │
[rank1]: │ 420 │ │ Split.TRAIN, │
[rank1]: │ 421 │ │ functools.partial( │
[rank1]: │ 422 │ │ │ process_batch, │
[rank1]: │

Expected behavior / 期待表现

希望能提供tools微调时的jsonl文件,能够跑通不报错。
trouble

@sixsixcoder
Copy link
Collaborator

这里有微调模板,https://zhipu-ai.feishu.cn/wiki/L1jpwBEqCiHocmkT3VzcQv5Znrg

@mudeguo
Copy link
Author

mudeguo commented Oct 12, 2024

这里有微调模板,https://zhipu-ai.feishu.cn/wiki/L1jpwBEqCiHocmkT3VzcQv5Znrg

就是用这个模版的数据做成train.jsonl, 一条复制了几百条,不知是否是因为数据一样的问题。如果官方给个jsonl文件就完美了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants