Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PSA: The existance of this draft PR does not mean imminent work on this is guaranteed, and it has not yet been decided whether this work would take place in this repo or on a fork/other approach. This is simply the easiest place to house the spec for comparison while we plan potential future work.
What
In order to bring API and Data standards together, we need to work through a practical example of what a new 'data catalogue' or Dataset API might look like. We know some of these changes will be significant, so we're taking the approach of this being a v2 - meaning we expect breaking changes. This also gives us an opportunity to review things we want to change about the API as it stands currently.
The Data standards that are being introduced are documented in this Application Profile where we're using Linked Data ontologies like
dcat
,prov
andcsvw
to describe datasets and in particular the CSVs which hold data. We've broadly kept the same structure as our current API, in terms of usingdataset
,edition
andversion
as endpoint names, and will continue to use@context
fields to indicate a JSON-LD context which includes mappings between simple API field names and their Linked Data vocabulary terms. This was to keep the balance between API responses that make sense to non-Linked-Data users, but also ensuring the richness of data and metadata for those who know how to leverage it.We are however changing the fields each endpoint contains, and several of our core terms to describe them, to make room for Linked Data terms to also enter the picture.
ID
fields are all moving toIdentifier
, so as not to clash with the new@id
field which will always be the fully qualified URL of the response object.We made an architectural choice that editions and versions should be the same fundamental object, so the
/editions/<edition>
response effectively returns the latest Version document always, with some additional navigational information for users who want to find other versions. In Dataset API nomenclature, this means the Instance will form the bulk of the response not just for Version requests, but also Editions - though some copying/caching may be implemented such that it is not the same literal database record.We are also introducing new API standards, by conforming more correctly with the HAL (Hypertext Application Language) specification for API responses. This means
links
are moving to_links
and a new_embedded
field has been introduced to give a sense of related resources. The only fields we intend to include in embedded documents are those that allow a user to disambiguate between objects in a list (i.e. the release date and version number on a series of editions or versions)How to review
I recommend taking the full raw text of the file and pasting it into a Swagger editor like https://editor-next.swagger.io/
Once the review has been conducted and we're happy that the v2 spec is roughly as we want it, I'll probably convert it to OpenAPI 3.0. This introduced some new fields I wasn't expecting so didn't want to further muddy this review with that conversation.
The diff cannot be trusted, it claims endpoints have been removed that are still present when you view the whole file
Types of questions to consider:
/datasets
list response for example?@types
applied to each response? Are@id
and@context
fields present everywhere they should be?Who can review
@janderson2 @rossbowen