Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

452 document odis metadata graph fundamentals #453

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

pbuttigieg
Copy link
Collaborator

Closes #452

Describing the ODIS graph at a conceptual level to promote homogeneous alignment.

@pbuttigieg pbuttigieg requested a review from fils July 18, 2024 10:06
@pbuttigieg pbuttigieg self-assigned this Jul 18, 2024
@pbuttigieg pbuttigieg linked an issue Jul 18, 2024 that may be closed by this pull request
@smrgeoinfo
Copy link

@pbuttigieg looks like the metadata about metadata section is not complete. Based on our discussions and the various specs, as well as the github issues (1,2 ), here's the draft text I'm proposing for CDIF:


In a harvesting/federated catalog system some metadata about the metadata is important to keep track of where metadata came from, what format/profile it uses (harvesters need this to process), and update dates see Metadata Content Requirements. Unambiguous expression of this information requires making statements about a metadata record distinct from the thing in the world that the metadata describes (See Github issues 1,2 ). In an RDF framework, this requires a distinct identifier for the metadata record object that will serve as the subject for these triples.

Schema.org includes several properties that can be used to embed information about the metadata record in the resource metadata: sdDatePublished, sdLicense, sdPublisher, but lacks a way to provide an identifier for the metadata record distinct from the resource it describes, to specify other agents responsible for the metadata except the publisher, or to assert specification or profile conformance for the metadata record itself.

In the RDF serialization, Schema.org metadata records are JSON-LD node objects, and include an "@id" keyword with a value that identifies the node. This identifier can be interpreted to represent a thing in the world that the metadata record (the 'node') is about, or to represent the metadata record (a JSON object) itself. Here is a short example record (other '@' properties are explained below):

{   "@context": "https://schema.org",
    "@id": "ex:URIforResource",
    "name": "unique title for the resource",
    "description": "Description of the resource",
	"dateModified": "2017-05-23"
}

When this JSON-LD is converted to RDF triples (e.g. using the JSON-LD playground ), this results:

<ex:URIforResource> <http://schema.org/description> "Description of the resource" .
<ex:URIforResource> <http://schema.org/name> "unique title for the resource" .
<ex:URIforResource> <http://schema.org/dateModified> "2017-05-23"^^<http://schema.org/Date> .

The interpretation of the first two sets of triples would be that they are statements about the thing in the world that the metadata record is about. The third triple is ambiguous-- was the metadata content modified, or the described resource in the world? There does not seem to be any recognized best practice or consensus for dealing with this issue, so CDIF defines these conventions.

Use the schema.org identifier property to identify a thing in the world that is the subject of the JSON-LD node. The identified thing might be physical, imaginary, abstract, or a digital object. The JSON-LD @id property identifies a node in a graph, and can be interpreted in different ways; as a URI it is expected to dereference to produce the same JSON-LD object in which it is defined. Given this convention, when the metadata record is processed, the processor should use the schema:identifier as subject of triples about the subject of the metadata record to avoid ambiguity. In addition, this convention would suggest that if a schema:identifier property is present, the @id property should be interpreted to identify the JSON object that is the representation of the node in the knowledge graph.

Statements about the metadata record as a distinct entity should be made using a separate identified node object. This node object can be embedded in the metadata record about the resource in the world (Example 1 below), or published as a separate node (Example 2 below).

{   "@context": [
        "https://schema.org",
        {"dcterms": "http://purl.org/dc/terms/",
         "ex":"https://example.com/99152/"
        }
    ],
    "@id": "ex:URIforNode1",
    "@type": "appropriate schema.org type",
	"identifier":"ex:URIforDescribedResource",
    "name": "unique title for the resource",
    "description": "Description of the resource",
    "subjectOf": {
        "@id": "ex:URIforNode2",
        "@type": "DigitalDocument",
        "dateModified": "2017-05-23",
		"identifier":"ex:URIforNode1",
        "description":"metadata about documentation for ex:URIforDescribedResource",
    	"dcterms:conformsTo": {"@id":"ex:cdif-metadataSpec"}
	}        
   }

Example 1. Metadata about the metadata embedded.

{
    "@context": [
        "https://schema.org",
        {"ex": "https://example.com/99152/"}
    ],
    "@graph": [
        {
            "@id": "ex:URIforNode1",
            "@type": "Dataset",
            "identifier": "ex:URIforDescribedResource",
            "name": "unique title for the resource",
            "description": "Description of the resource"
        },
        {
            "@id": "ex:URIforNode2",
            "@type": "DigitalDocument",
            "dateModified": "2017-05-23",
            "identifier": "ex:URIforNode1",
            "description": "metadata about documentation for ex:URIforDescribedResource",
            "dcterms:conformsTo": {"@id": "ex:cdif-metadataSpec"}
        }
    ]
}

Example 2. Metadata about metadata as a separate graph node.

Including the schema:description with the string "metadata about documentation for ex:URIforDescribedResource" will allow disambiguating different usages of the subjectOf property. The ex namespace in the example above is only included so the example is valid; actual metadata would likely have its own namespace for resource and metadata URIs. The distinct identifier for the metadata record (ex:URIforNode1) allows statements to be made about the metadata separately from statements about the resource it describes.

@pbuttigieg
Copy link
Collaborator Author

xref #102

@pbuttigieg
Copy link
Collaborator Author

Thanks @smrgeoinfo - I'll think about the content and iterate on the CDIF repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document ODIS metadata graph fundamentals
2 participants