Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kg build with unstructured data, an attempt! #746

Open
2 of 3 tasks
wey-gu opened this issue Jan 19, 2024 · 5 comments
Open
2 of 3 tasks

kg build with unstructured data, an attempt! #746

wey-gu opened this issue Jan 19, 2024 · 5 comments
Labels
type/feature req Type: feature request

Comments

@wey-gu
Copy link
Contributor

wey-gu commented Jan 19, 2024

Input Data: paul_graham_essay.txt

Schema:
(RDF Style, will try PropGraph Style later)

CREATE TAG `entity` `name` string;
CREATE EDGE `relationship` `name` string;

Log: all.log

Model: Azure OpenAI: 3.5-turbo 2023-07-01-preview


It's smooth! Thanks! Something we may consider change:

...
the knowledge graph has following schema and node name must be a real :
...
NodeType "entity" ("name":string )
EdgeType "relationship" ("name":string )
...
{userPrompt}
...
  • feat: consider putting the tailing NGQL DML at the end of all.log in a separate file to be downloadable?
  • feat: support parsing edge prop

It seems in such rdf-style schema(as follows), with only one type of edge, and put edge type as edge.name, our current implementation is not guiding LLM to generate edge props. I guess we firstly address the typical Property Graph style modeling?

"""graph-schema
NodeType "entity" ("name":string )
EdgeType "relationship" ("name":string ) #<---------- name

"""
{userPrompt}
Return the results directly, without explain and comment. The results should be in the following JSON format:
{
  "nodes":[{ "name":string,"type":string,"props":object }],
  "edges":[{ "src":string,"dst":string,"edgeType":string,"props":object }] #<----------- there is no name here.
}

Thus, the extracted edges are w/o edge type(name prop):

INSERT EDGE `relationship` () VALUES "software as a service"->"Aspra":();
INSERT EDGE `relationship` () VALUES "still life"->"visual cues":();
INSERT EDGE `relationship` () VALUES "color changes"->"visual cues":();
INSERT EDGE `relationship` () VALUES "software"->"documents":();
@wey-gu wey-gu changed the title kg build failed with unstructured data kg build with unstructured data, an attempt! Jan 19, 2024
@vesoft-inc vesoft-inc deleted a comment from yyh0808 Jan 19, 2024
@mizy
Copy link
Contributor

mizy commented Jan 19, 2024

feat: support parsing edge prop

{
  "nodes":[{ "name":string,"type":string,"props":object }],
  "edges":[{ "src":string,"dst":string,"edgeType":string,"props":object }] #<----------- there is no name here.
}

this prompt is just a example output for ensure the result format. the graph schema already inject to the prompt like NodeType "entity" ("name":string )
and some space may have many edge types & edge props so that to lead a very lang context.
in my case the edge prop is generated successfuly

{
  "nodes": [
    { "name": "我", "type": "人物", "props": {"角色": "测试工程师"} },
    { "name": "Explorer项目", "type": "物品", "props": {"版本": ["v3.6.0", "v3.7.0"], "工作内容": ["测试工作", "提交issue", "设计并实施测试用例", "API自动化测试", "制定测试计划", "编写测试报告"] } },
    { "name": "Analytics项目", "type": "物品", "props": {"版本": ["v3.6.0"], "工作内容": ["迭代测试", "功能测试", "性能测试", "提交issue", "编写测试用例和测试报告"] } },
    { "name": "银行项目", "type": "物品", "props": {"工作内容": ["部署内部图计算测试集群", "数据开发", "构造风控业务场景", "编写并执行测试用例", "跟踪issue"] } },
    { "name": "confluence", "type": "物品", "props": {"用途": "记录测试用例"} },
    { "name": "cloud代码库", "type": "物品", "props": {"用途": "存储API自动化测试代码"} }
  ],
  "edges": [
    { "src": "我", "dst": "Explorer项目", "edgeType": "关系", "props": {"关系类型": "负责测试"} },
    { "src": "我", "dst": "Analytics项目", "edgeType": "关系", "props": {"关系类型": "负责测试"} },
    { "src": "我", "dst": "银行项目", "edgeType": "关系", "props": {"关系类型": "负责支持与测试"} },
    { "src": "我", "dst": "confluence", "edgeType": "关系", "props": {"关系类型": "在其中记录测试用例"} },
    { "src": "我", "dst": "cloud代码库", "edgeType": "关系", "props": {"关系类型": "在其中提交API自动化测试代码"} }
  ]
}

@wey-gu
Copy link
Contributor Author

wey-gu commented Jan 19, 2024

this prompt is just a example output for ensure the result format. the graph schema already inject to the prompt like NodeType "entity" ("name":string )

I see, we could tune the prompt to better address this, in my case(where I think the schema generate no obvious confusion) it failed in most cases:

> match ()-[e]->() RETURN e
+------------------------------------------------------------------------------------------------------+
| e                                                                                                    |
+------------------------------------------------------------------------------------------------------+
| [:relationship "teacher"->"grade" @0 {name: __NULL__}]                                               |
| [:relationship "essay question"->"Cezanne" @0 {name: __NULL__}]                                      |
| [:relationship "roommate"->"Robert" @0 {name: __NULL__}]                                             |
| [:relationship "Florence"->"Duomo" @0 {name: __NULL__}]                                              |
| [:relationship "Florence"->"Orsanmichele" @0 {name: __NULL__}]                                       |
| [:relationship "Florence"->"Pitti" @0 {name: __NULL__}]                                              |
| [:relationship "Florence"->"Via Ricasoli" @0 {name: __NULL__}]                                       |
| [:relationship "Florence"->"budget" @0 {name: __NULL__}]                                             |
| [:relationship "bedroom"->"night" @0 {name: __NULL__}]                                               |
| [:relationship "Lisp"->"AI" @0 {name: __NULL__}]                                                     |
| [:relationship "Lisp"->"C++" @0 {name: __NULL__}]                                                    |
| [:relationship "Lisp"->"John McCarthy" @0 {name: __NULL__}]                                          |
| [:relationship "Lisp"->"Lisp hacker" @0 {name: __NULL__}]                                            |
| [:relationship "Lisp"->"McCarthy" @0 {name: __NULL__}]                                               |
| [:relationship "Lisp"->"On Lisp" @0 {name: __NULL__}]                                                |
| [:relationship "Lisp"->"Platonic form of Lisp" @0 {name: __NULL__}]                                  |
| [:relationship "Lisp"->"Turing machine" @0 {name: __NULL__}]                                         |
| [:relationship "Lisp"->"book" @0 {name: __NULL__}]                                                   |
| [:relationship "Lisp"->"expression" @0 {name: __NULL__}]                                             |
| [:relationship "Lisp"->"shot" @0 {name: __NULL__}]                                                   |
| [:relationship "comment"->"angry people" @0 {name: __NULL__}]                                        |
| [:relationship "RISD"->"HTML" @0 {name: __NULL__}]                                                   |
| [:relationship "RISD"->"Providence" @0 {name: __NULL__}]                                             |
| [:relationship "RISD"->"art" @0 {name: __NULL__}]                                                    |
| [:relationship "RISD"->"job" @0 {name: __NULL__}]                                                    |
| [:relationship "RISD"->"money" @0 {name: __NULL__}]                                                  |
| [:relationship "experts"->"talks" @0 {name: __NULL__}]                                               |
| [:relationship "run"->"air conditioners" @0 {name: __NULL__}]                                        |
| [:relationship "Arc"->"interpreter" @0 {name: __NULL__}]                                             |
| [:relationship "brain"->"feature" @0 {name: __NULL__}]                                               |

@wey-gu
Copy link
Contributor Author

wey-gu commented Jan 19, 2024

I will look into the prompt and tune it to improve this in a PR then.

@mizy
Copy link
Contributor

mizy commented Jan 19, 2024

I will look into the prompt and tune it to improve this in a PR then.

ok, maybe some space is not work well with this prompt

@wey-gu
Copy link
Contributor Author

wey-gu commented Jan 19, 2024

I proposed a changed default prompt via #751, kindly help review that :D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants