Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define anchor usage in yaml-ld #13

Open
ioggstream opened this issue May 30, 2022 · 10 comments
Open

Define anchor usage in yaml-ld #13

ioggstream opened this issue May 30, 2022 · 10 comments
Labels
spec Issue on specification UCR Issue on Use Case/Recommendation
Milestone

Comments

@ioggstream
Copy link
Contributor

ioggstream commented May 30, 2022

As an json-ld editor … WHO
I want to use yaml anchors … WHAT
So that I can easily reuse content … WHY

Note

The specification should define:

  • when it is legitimate to use anchors
  • which are the expectation on anchor usage (e.g. do they represent a specific JSON-LD node or they can just be used to represent content?)
  • are there any constraint on anchor usage? (e.g. the representation graph MAY / MUST NOT be a cyclic graph...)

example 1

---
- "@id": &homer http://example.org/#homer  # Anchor the homer url
  http://example.com/vocab#name:
  - "@value": Homer
- "@id": http://example.org/#bart
  http://example.com/vocab#name:
  - "@value": Bart
  http://example.com/vocab#parent:
  - "@id": *homer                               # reuse the anchor instead of re-typing the homer url
- "@id": http://example.org/#lisa
  http://example.com/vocab#name:
  - "@value": Lisa
  http://example.com/vocab#parent:
  - "@id": *homer

example 2

Using anchor and alias nodes https://gist.github.com/ioggstream/31f3226fa9976b3baf0800f44bc19c98

@ioggstream ioggstream added the UCR Issue on Use Case/Recommendation label May 30, 2022
@VladimirAlexiev
Copy link
Contributor

  • Example 2 is from the d3fend.mitre.org cybersecurity ontology.
  • YAML spec: https://yaml.org/spec/1.2.2/#anchors-and-aliases
  • Anchors and Aliases can represent non-tree graph structures, whereas JSON is a tree
  • The above are a bit untypical examples of reusing small fragments of YAML. The typical example is reusing a whole RDF node, which in JSONLD happens by @id. Nevertheless, using YAML Anchors and Aliases ensures referential integrity within the document (that the @id is not mistyped).
  • We should describe how Anchors and Aliases could mesh with JSON-LD Frames

@pchampin
Copy link
Contributor

One point where I believe YAML anchors can help are the description complex of contexts. E.g.

{
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "@vocab": "http://example.com/ns/Company/",
    "founder": { "@context": {
        "@vocab": "http://example.com/ns/Person/",
        "birthDate": { "@type": "xsd:date" }
    }},
    "employee": { "@context": {
        "@vocab": "http://example.com/ns/Person/",
        "birthDate": { "@type": "xsd:date" }
    }}
  }
}

Notice that the scoped contexts of founder and employee are exactly the same (a "person" context). With Yaml anchors, this redundancy could be elimiinated.

NB: there are other means to get rid of this redundancy in pure JSON-LD:

  • hosing the "person" context at a different URL and use that URL instead
  • define a type-scoped context for a type Person, and expect values of founder and employee to be explicitly typed

but they have their drawbacks that are not always acceptable.

@ioggstream
Copy link
Contributor Author

That's exactly the kind of discussions and examples we need :)

"@context":
  xsd: http://www.w3.org/2001/XMLSchema#
  "@vocab": http://example.com/ns/Company/
  founder:
    "@context": &person-context
      "@vocab": http://example.com/ns/Person/
      birthDate:
        "@type": xsd:date
  employee:
    "@context": *person-context

@VladimirAlexiev
Copy link
Contributor

how Anchors and Aliases could mesh with JSON-LD Frames

Frames specify which nodes to expand, and which nodes to merely refer to by URI.
So in some sense they tackle the "graph vs tree" problem.

Anchors and Aliases tackle the same problem; intuitively I feel in a more general way.

So: what can be the connection between them?

@anatoly-scherbakov
Copy link
Contributor

I am not entirely clear on how anchors would actually affect the LD part of the picture. Having a YAML document with anchors, we're going to convert it to JSON — and in that conversion, the anchors will be resolved. Thus, a JSON-LD processor that we will subsequently use won't know anything about those anchors.

This is similar to C preprocessor directives which are resolved before the source file is consumed by the compiler itself.

Is that right? If yes, can't we safely ignore these particular YAML features relying upon YAML spec to describe them?

@gkellogg
Copy link
Member

gkellogg commented Jun 2, 2022

Of course, JSON-LD does encode a graph in JSON; what used to be called a node reference is of the form {"@id": "..."}. Framing has an @embed keyword that can control how this works with one or all instances of a node referenced either fully or as a reference.

The YAML anchor/alias mechanism is similar the the framing keys, and also similar in concept to the @included keyword.

For now, I think we need to be cautions on depending on any YAML features beyond JSON re-serialization until we understand the requirements for round-tripping. a YAML-LD extended profile could allow us to move beyond what can easily be represented in JSON-LD, and we need to tread carefully.

@VladimirAlexiev
Copy link
Contributor

Anchors can be used to define fragment IDs inside YAML instance data, like attributes @id and href/@name do in HTML.

@ioggstream where was your proposal for such fragments? In addition to anchors, it used JSON Path to address any element in the JSON/YAML structure.

Eg if at https://example.com/TheSimpsons.yaml we have:

*Bart:
  name: Bart Simpsons
  gender: male

Then the alias would be resolved to https://example.com/TheSimpsons.yaml#Bart

The same in plain YAML-LD would look like this:

- "@id": Bart
  name: Bart Simpsons
  gender: male

--

@anatoly-scherbakov basically says that anchors/aliases must be resolved by the YAML processor and elided, i.e. anchors can only be used locally inside one file.
Furthermore, the shared info must be copied out during the resolution.
I like @pchampin's concrete example of using aliases to express a context more economically.
But being a graph person, I dislike expanding shared graph structures by copying them out.

--

If anchor-based data sharing is necessarily local (limited to one file), then perhaps we can use it at least for blank nodes and avoid copying? Eg

valve1:
  temperature: *temp100C
    value: 100
    unit: degC
valve2:
  temperature: &temp100C

Should result in this turtle

<valve1> :temperature _:temp100C.
<valve2> :temperature _:temp100C.
_:temp100C :value 100; :unit <degC>.

and NOT this one:

<valve1> :temperature [:value 100; :unit <degC>]
<valve2> :temperature [:value 100; :unit <degC>].

@ioggstream
Copy link
Contributor Author

@VladimirAlexiev let me try to clarify your examples:

Syntax tweak. A keyword cannot start with *; Anchor is attached to a node.

Bart: &BartSimpsons  #  create an anchor to this node.
  name: Bart Simpsons
  gender: male

I don't think that this can implicitly map to a @id: Bart because Anchors are a serialization details. The above document can be legitimately be serialized as

Bart: &anchor001  #  create an anchor to this node.
  name: Bart Simpsons
  gender: male

Homer:
  children:
  - *anchor001  # An Alias references an anchor.

Representation graph

iiuc the yaml below

t100: &t100 100
valve1:
  temperature: &temp100C
    value: *t100
    unit: degC
valve2:
  temperature: *temp100C

maps to the following YAML rep. graph

graph LR;
  root --> t100 & valve1 & valve2
  t100 --> 100
  valve1 --> temperature1[temperature] -->temp100C --> value & unit
  value --> t100
  unit --> degC
  valve2 --> temperature2[temperature] -->temp100C
Loading

The first question I asked myself is: how do pyyaml process this information?.

pyyaml preserves reference when parsing mutable structures to a dict()

temperature = yaml.safe_load(temperature_yaml)  # see doc above
assert temperature['valve1']['temperature']['value'] == 100
assert temperature['valve2']['temperature']['value'] == 100
# assign a new temperature
temperature['valve1']['temperature']['value'] = 200
assert temperature['valve2']['temperature']['value'] == 200  # Changed.

but acting on an immutable structure, things changes

assert temperature["t100"] == 100
assert temperature['valve2']['temperature']['value'] == 100
temperature["t100"] = 200
assert temperature['valve2']['temperature']['value'] == 100  # Not changed.

gkellogg added a commit that referenced this issue Jul 2, 2022
@ioggstream ioggstream added this to the -00 milestone Jul 5, 2022
@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jul 11, 2022

Sharing and Cycles (Frames)

Frames are quite key because they define what part of an RDF graph and how to unroll it to a JSON tree.

@gkellogg in #44

The JSON-LD Framing algorithm is quite complicated as it is.

Agreed, and I don't even know it properly. Of course, we'll use it whole-cloth without modification.

But I intuitively feel that anchors may have something to do with Frames because both address (to some degree) the problem "given a graph, how to serialize part of it as a tree".
Both allow to share objects and handle cycles (to avoid infinite embedding), but:

  • JSON-LD can share RDF nodes and nothing else
  • YAML-LD anchors can share finer-grain structures: node URLs, single literals, pieces of objects (similar to @included)

Modularity/Structuring

@pchampin

anchors can help in the description of complex contexts

JSON Schema has special modularity/structuring facilities, see https://json-schema.org/understanding-json-schema/structuring.html

So the question of YAML fragments and pointers, and how they relate to Schema fragments and JSON Pointers, is key.
@ioggstream has been struggling with this problem: please take charge of this, keep up the fight, and we'll help as much as we can!

Syntax tweak

Thanks!

Representation graph

Yes, but the alias "nodes" t100, temp100C are quite different from the others because they carry no info and instead are just redirection pointers (so maybe use a different color).

@gkellogg
Copy link
Member

gkellogg commented Aug 3, 2022

This issue was discussed on the Aug 03 meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec Issue on specification UCR Issue on Use Case/Recommendation
Projects
None yet
Development

No branches or pull requests

5 participants