Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于toolcall微调数据格式请教一下~ #6521

Closed
1 task done
Jimmy-L99 opened this issue Jan 3, 2025 · 0 comments
Closed
1 task done

关于toolcall微调数据格式请教一下~ #6521

Jimmy-L99 opened this issue Jan 3, 2025 · 0 comments
Labels
wontfix This will not be worked on

Comments

@Jimmy-L99
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

Reproduction

一条toolcall微调数据模板如下,来自于ChatGLM4

{
  "messages": [
    {
      "role": "system",
      "content": "",
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_recommended_books",
            "description": "Get recommended books based on user's interests",
            "parameters": {
              "type": "object",
              "properties": {
                "interests": {
                  "type": "array",
                  "items": {
                    "type": "string"
                  },
                  "description": "The interests to recommend books for"
                }
              },
              "required": [
                "interests"
              ]
            }
          }
        }
      ]
    },
    {
      "role": "user",
      "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."
    },
    {
      "role": "assistant",
      "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"
    },
    {
      "role": "observation",
      "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
    },
    {
      "role": "assistant",
      "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
    }
  ]
}

想请教一下,如果按照上面模板构建数据,该如何填写dataset_info.json里面的tags

  "glm4_toolcall_dataset":{
    "file_name": "",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags":{
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant",
      "system_tag": "system",
    }
  }

1. 对话出现了两个assistant和一个observation,需要如何修改tags

2. 还是说按照标准sharegpt重新构建数据?

3. 如果用langchain之类的框架,tools列表里面存在多个tool,传入llm的请求将会是:

2025-01-03 14:55:49.543 | INFO     | __main__:generate_stream_glm4:207 - inputs: [gMASK]<sop><|system|>
你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的,你的任务是针对用户的问题和要求提供适当的答复和支持。

# 可用工具
## add

{
    "name": "add",
    "description": "计算两个数字的和",
    "parameters": {
        "type": "object",
        "properties": {
            "a": {
                "description": "第一个数字",
                "type": "integer"
            },
            "b": {
                "description": "第二个数字",
                "type": "integer"
            }
        },
        "required": [
            "a",
            "b"
        ]
    }
}
在调用上述函数时,请使用 Json 格式表示调用的参数。
## multiply

{
    "name": "multiply",
    "description": "计算两个数字的乘积",
    "parameters": {
        "type": "object",
        "properties": {
            "a": {
                "description": "第一个数字",
                "type": "integer"
            },
            "b": {
                "description": "第二个数字",
                "type": "integer"
            }
        },
        "required": [
            "a",
            "b"
        ]
    }
}
在调用上述函数时,请使用 Json 格式表示调用的参数。
## weather

{
    "name": "weather",
    "description": "该工具只能得到指定城市的天气信息\n不能用于两个天气数据对比或其他用途",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "description": "城市名称",
                "type": "string"
            }
        },
        "required": [
            "city"
        ]
    }
}
在调用上述函数时,请使用 Json 格式表示调用的参数。
## purchase_order

{
    "name": "purchase_order",
    "description": "该工具只能获取文本中的订单的参数,\n分别是订单名称、采购物品、订单数量、订单价格",
    "parameters": {
        "type": "object",
        "properties": {
            "name": {
                "description": "订单名称",
                "type": "string"
            },
            "project": {
                "description": "采购物品",
                "type": "string"
            },
            "num": {
                "description": "订单数量",
                "type": "integer"
            },
            "price": {
                "description": "订单价格",
                "type": "integer"
            }
        },
        "required": [
            "name",
            "project",
            "num",
            "price"
        ]
    }
}
在调用上述函数时,请使用 Json 格式表示调用的参数。
## get_current_time

{
    "name": "get_current_time",
    "description": "该工具可以获取当前日期和时间",
    "parameters": {
        "type": "object",
        "properties": {
            "time": {
                "description": "时间",
                "type": "string"
            }
        },
        "required": [
            "time"
        ]
    }
}
在调用上述函数时,请使用 Json 格式表示调用的参数。<|user|>

构造标准sharegpt数据时,tools是要同时写进所有tool,还是只写入用到的某个tool?

问题有点长,且内容较多,还恳请解惑。

Expected behavior

希望更深入地了解toolcall微调的数据格式。

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 3, 2025
Repository owner locked and limited conversation to collaborators Jan 3, 2025
@hiyouga hiyouga converted this issue into discussion #6522 Jan 3, 2025
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Jan 3, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants