Define input/output format #6

cmungall · 2018-04-23T02:02:46Z

I will write proposals as individual comments in this ticket

cmungall · 2018-04-23T02:04:41Z

RDF

Pros:

already a standard

Cons:

multiple ways of handling reification
even if we bless one, still awkward to handle

recommendation: The format MUST have a defined mapping to the RDF model, but there MAY be another serialization format

cmungall · 2018-04-23T02:05:32Z

GraphML

Pros:

standard
allows for property graphs

Cons:

no standard way of defining which node or edge properties are accepted

cmungall · 2018-04-23T02:08:04Z

Translator TSV

Spec (loosely defined):

TSV or CSV
Nodes and Edges as separate files
Multivalued columns separated by |
Must be readily translated to a python DataFrame - i.e. first line should be a header
Headers must come from Translator spec http://bit.ly/tr-kg-standard
E.g.
- node
- edge

cmungall · 2018-04-23T02:09:56Z

Translator JSON

Follow KB standard

Structurally almost identical to TSV above, the doc would have a section for nodes and a section for edges

Cons:

larger I/O or disk footprint than TSV for little gain

RichardBruskiewich · 2018-04-23T02:52:34Z

I've expressed this reservation before (to you Chris), but I'm wondering whether simply defining the nodes and edges alone suffices for knowledge graph representation. In effect, we need to annotate statements, not just subject nodes of a statement, but simply annotating the predicates doesn't help either. It does seem to be necessary to treat statements as a reified node, then hang everything off of it: subject, predicate, object, evidence, provenance, etc.

yy20716 · 2018-04-23T07:57:15Z

Chris, I wonder if we could also briefly mention GraphQL and Tinkerpop as well. The data model used in these languages/systems are also based on or similar to the property graph, so their pros and cons are also very similar to the ones of GraphML. I listed down additional pros and cons as follows.

One of the advantages of the models based on the property graphs and their variations is that they may be not suffered from the reification problem in RDF. The data can be presented and formatted in more compact ways (compared to RDF).
However, employing the property graphs may be not a good choice for the case where we need to integrate or link multiple different datasets, which can happen in this project. For example, most property graph models do not use concepts of IRIs and typed literals for describing their entities, thus it can be challenging when we need to handle entities labeled with the same values, which often exist in different datasets.

cmungall · 2018-04-24T01:21:27Z

@yy20716 we will need to be careful about how we map CURIEs to IRIs. If the exchange format is not an RDF serialization (which already has precisely defined mechanisms) we will need to embed a prefix mapping in the exchange file or have a standard one.

cmungall added a commit to biolink/kgx that referenced this issue Apr 24, 2018

init, first pass for NCATS-Tangerine/translator-knowledge-graph#6

9b61670

jmcmurry added the identifiers label Apr 25, 2018

cmungall mentioned this issue Apr 25, 2018

Implement tool for distributing a KG as BdBags biolink/kgx#9

Closed

nlharris added the obsolete-translator label Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define input/output format #6

Define input/output format #6

cmungall commented Apr 23, 2018 •

edited

Loading

cmungall commented Apr 23, 2018

cmungall commented Apr 23, 2018

cmungall commented Apr 23, 2018 •

edited

Loading

cmungall commented Apr 23, 2018

RichardBruskiewich commented Apr 23, 2018

yy20716 commented Apr 23, 2018 •

edited

Loading

cmungall commented Apr 24, 2018

Define input/output format #6

Define input/output format #6

Comments

cmungall commented Apr 23, 2018 • edited Loading

cmungall commented Apr 23, 2018

RDF

cmungall commented Apr 23, 2018

GraphML

cmungall commented Apr 23, 2018 • edited Loading

Translator TSV

cmungall commented Apr 23, 2018

Translator JSON

RichardBruskiewich commented Apr 23, 2018

yy20716 commented Apr 23, 2018 • edited Loading

cmungall commented Apr 24, 2018

cmungall commented Apr 23, 2018 •

edited

Loading

cmungall commented Apr 23, 2018 •

edited

Loading

yy20716 commented Apr 23, 2018 •

edited

Loading