Proposed v2 spec #440

eldeal · 2023-10-31T15:17:47Z

PSA: The existance of this draft PR does not mean imminent work on this is guaranteed, and it has not yet been decided whether this work would take place in this repo or on a fork/other approach. This is simply the easiest place to house the spec for comparison while we plan potential future work.

What

In order to bring API and Data standards together, we need to work through a practical example of what a new 'data catalogue' or Dataset API might look like. We know some of these changes will be significant, so we're taking the approach of this being a v2 - meaning we expect breaking changes. This also gives us an opportunity to review things we want to change about the API as it stands currently.

The Data standards that are being introduced are documented in this Application Profile where we're using Linked Data ontologies like dcat, prov and csvw to describe datasets and in particular the CSVs which hold data. We've broadly kept the same structure as our current API, in terms of using dataset, edition and version as endpoint names, and will continue to use @context fields to indicate a JSON-LD context which includes mappings between simple API field names and their Linked Data vocabulary terms. This was to keep the balance between API responses that make sense to non-Linked-Data users, but also ensuring the richness of data and metadata for those who know how to leverage it.

We are however changing the fields each endpoint contains, and several of our core terms to describe them, to make room for Linked Data terms to also enter the picture. ID fields are all moving to Identifier, so as not to clash with the new @id field which will always be the fully qualified URL of the response object.

We made an architectural choice that editions and versions should be the same fundamental object, so the /editions/<edition> response effectively returns the latest Version document always, with some additional navigational information for users who want to find other versions. In Dataset API nomenclature, this means the Instance will form the bulk of the response not just for Version requests, but also Editions - though some copying/caching may be implemented such that it is not the same literal database record.

We are also introducing new API standards, by conforming more correctly with the HAL (Hypertext Application Language) specification for API responses. This means links are moving to _links and a new _embedded field has been introduced to give a sense of related resources. The only fields we intend to include in embedded documents are those that allow a user to disambiguate between objects in a list (i.e. the release date and version number on a series of editions or versions)

How to review

I recommend taking the full raw text of the file and pasting it into a Swagger editor like https://editor-next.swagger.io/
Once the review has been conducted and we're happy that the v2 spec is roughly as we want it, I'll probably convert it to OpenAPI 3.0. This introduced some new fields I wasn't expecting so didn't want to further muddy this review with that conversation.

The diff cannot be trusted, it claims endpoints have been removed that are still present when you view the whole file

Types of questions to consider:

As an API user, does each endpoint return the information you'd need to ascertain if this is the right resource for you? If you were looking for a specific dataset, would you be able to find it from the /datasets list response for example?
Are pagination fields appropriately available on list endpoints conforming with the API standards?
Are all the right endpoints present that you would expect?
Have any fields been removed that you know other applications or processes rely on? (NB: not renamed, but fully removed)
Are the correct LD @types applied to each response? Are @id and @context fields present everywhere they should be?
Are the fields required by the Application Profile present and in the correct places/on the right endpoints? In particular, are lists returning the right subsets of fields?

Who can review

@janderson2 @rossbowen

Also remove import_tasks which are a CMD specific feature

eldeal added 9 commits October 17, 2023 13:47

Remove all dimension, observation and metadata endpoints and models

7d396c9

Also remove import_tasks which are a CMD specific feature

Update models to conform more closely with application profile (AP) work

b632e04

Restructure to identify base fields reused across different endpoints

953e477

Move pagination in swagger spec to reusable field

e04c180

Add _embedded to /editions/{id} spec

a5443cb

Update instance spec to reflect dcat changes and add POST /editions

402f351

Reintroduce dimensions endpoints for census datasets in spec

a7efc02

Fix errors and begin to better structure definitions with 'links'

076e2d7

Reintroduce /options PATCH request in spec

a9132f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed v2 spec #440

Proposed v2 spec #440

eldeal commented Oct 31, 2023 •

edited

Loading

Proposed v2 spec #440

Are you sure you want to change the base?

Proposed v2 spec #440

Conversation

eldeal commented Oct 31, 2023 • edited Loading

PSA: The existance of this draft PR does not mean imminent work on this is guaranteed, and it has not yet been decided whether this work would take place in this repo or on a fork/other approach. This is simply the easiest place to house the spec for comparison while we plan potential future work.

What

How to review

Who can review

eldeal commented Oct 31, 2023 •

edited

Loading