Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed v2 spec #440

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from
Draft

Proposed v2 spec #440

wants to merge 9 commits into from

Conversation

eldeal
Copy link
Contributor

@eldeal eldeal commented Oct 31, 2023

PSA: The existance of this draft PR does not mean imminent work on this is guaranteed, and it has not yet been decided whether this work would take place in this repo or on a fork/other approach. This is simply the easiest place to house the spec for comparison while we plan potential future work.

What

In order to bring API and Data standards together, we need to work through a practical example of what a new 'data catalogue' or Dataset API might look like. We know some of these changes will be significant, so we're taking the approach of this being a v2 - meaning we expect breaking changes. This also gives us an opportunity to review things we want to change about the API as it stands currently.

The Data standards that are being introduced are documented in this Application Profile where we're using Linked Data ontologies like dcat, prov and csvw to describe datasets and in particular the CSVs which hold data. We've broadly kept the same structure as our current API, in terms of using dataset, edition and version as endpoint names, and will continue to use @context fields to indicate a JSON-LD context which includes mappings between simple API field names and their Linked Data vocabulary terms. This was to keep the balance between API responses that make sense to non-Linked-Data users, but also ensuring the richness of data and metadata for those who know how to leverage it.

We are however changing the fields each endpoint contains, and several of our core terms to describe them, to make room for Linked Data terms to also enter the picture. ID fields are all moving to Identifier, so as not to clash with the new @id field which will always be the fully qualified URL of the response object.

We made an architectural choice that editions and versions should be the same fundamental object, so the /editions/<edition> response effectively returns the latest Version document always, with some additional navigational information for users who want to find other versions. In Dataset API nomenclature, this means the Instance will form the bulk of the response not just for Version requests, but also Editions - though some copying/caching may be implemented such that it is not the same literal database record.

We are also introducing new API standards, by conforming more correctly with the HAL (Hypertext Application Language) specification for API responses. This means links are moving to _links and a new _embedded field has been introduced to give a sense of related resources. The only fields we intend to include in embedded documents are those that allow a user to disambiguate between objects in a list (i.e. the release date and version number on a series of editions or versions)

How to review

I recommend taking the full raw text of the file and pasting it into a Swagger editor like https://editor-next.swagger.io/
Once the review has been conducted and we're happy that the v2 spec is roughly as we want it, I'll probably convert it to OpenAPI 3.0. This introduced some new fields I wasn't expecting so didn't want to further muddy this review with that conversation.

The diff cannot be trusted, it claims endpoints have been removed that are still present when you view the whole file

Types of questions to consider:

  • As an API user, does each endpoint return the information you'd need to ascertain if this is the right resource for you? If you were looking for a specific dataset, would you be able to find it from the /datasets list response for example?
  • Are pagination fields appropriately available on list endpoints conforming with the API standards?
  • Are all the right endpoints present that you would expect?
  • Have any fields been removed that you know other applications or processes rely on? (NB: not renamed, but fully removed)
  • Are the correct LD @types applied to each response? Are @id and @context fields present everywhere they should be?
  • Are the fields required by the Application Profile present and in the correct places/on the right endpoints? In particular, are lists returning the right subsets of fields?

Who can review

@janderson2 @rossbowen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant