Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial NCEI dataset JSON-LD template #365

Closed
wants to merge 50 commits into from
Closed

Conversation

jmckenna
Copy link
Contributor

@pbuttigieg pbuttigieg linked an issue Jan 10, 2024 that may be closed by this pull request
@pbuttigieg
Copy link
Collaborator

We passed an example dataset landing page with embedded JSON-LD/schema.org through the schema.org validator.

Here's the full address: https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:276264

We spotted a few issues to fix/improve

Things to improve/fix:

  • keywords: this should be a JSON array rather than a single list with comma separation - the current form will be understood as a single keyword, which will reduce interoperability
    • the solution seems to be to correct the script generating the JSON-LD from the local metadata to generate an array
  • @person and @organization values are NOAA generated URLs. These are likely not the IDs that the people and institutions would like to be identified with: ORCIDs, OceanExpert IDs, RORs, or other identifiers are to be preferred. Recommendation: NOAA should ask those that register to upload datasets to NCEI to enter their authoritative URL/PIDs for themselves and their institutions, and those should be shared in the metadata exposed to the web.
  • GCMD keyword URLs are not resolving - something to check with NASA GCMD. Recommendation: for spatialCoverage values that are keywords or terms from a vocabulary, consider using DefinedTerm.
  • The publishing Organization (NCEI) and the subOrganization (NOAA) listed under it may be inverted - this should be parentOrganization which is more sensible.
  • General recommendation: the rich metadata for organizations and publishers in each record is very good, but may be more efficiently handled by creating and @Organization JSON-LD document with its own PID and linking off to it. It would be wise to do this only when there's a commitment to maintain these documents in perpetuity. For the moment, it's probably best to keep the repeated metadata in each @Dataset record.
  • the distribution property seems to be misapplied: one would include URLs to directly download the dataset itself, along with metadata on the type of file, size, etc. Right now they take one to a landing page, which is somewhat redundant with the url property. These links exist, and look like this: https://www.ncei.noaa.gov/archive/archive-management-system/OAS/bin/prd/jquery/download/276264.1.1.tar.gz

@pbuttigieg
Copy link
Collaborator

Xref #361 (comment)

@jmckenna
Copy link
Contributor Author

Closing, moved this to the new repo for templates instead: iodepo/odis-in#18

@jmckenna jmckenna closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

connect NCEI catalogue as an ODIS node
2 participants