Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Commit

Permalink
Versioning and extensions (#42)
Browse files Browse the repository at this point in the history
* Added `resources` heading

* Added $schema

* Updated urls

* Removed `package.profile`

* Updated $schema

* Started extensions

* Finished extensions

* Updated wording

* Updated JSONSchema version

* Added extensions note

* Fixed recursivity

* Updated sections

* Fixed extension example

* Updated JSON Schema version

* Updated extensions features list

* Fixed unfinished sentence

* Replace idempotent -> immutable

* Update content/docs/specifications/extensions.md

Co-authored-by: Peter Desmet <[email protected]>

* Improved grammar

---------

Co-authored-by: Peter Desmet <[email protected]>
  • Loading branch information
roll and peterdesmet authored Apr 11, 2024
1 parent 52e43a1 commit a731ab4
Show file tree
Hide file tree
Showing 12 changed files with 154 additions and 54 deletions.
28 changes: 10 additions & 18 deletions content/docs/specifications/data-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,16 @@ The `resources` property is `REQUIRED`, with at least one resource.

Packaged data resources are described in the `resources` property of the package descriptor. This property `MUST` be an array of `objects`. Each object `MUST` follow the [Data Resource ](../data-resource/) specification.

### `$schema`

A root level Data Package descriptor `MAY` have a `$schema` property that `MUST` point to a profile as per [Profile](../glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/datapackage.json` and the recommended value is `https://datapackage.org/profiles/2.0/datapackage.json`.

:::note[Backward Compatibility]
If the `$schema` property is not provided but a descriptor has the `profile` property a data consumer `MUST` validate the descriptor according to the [Profiles](https://specs.frictionlessdata.io/profiles/) specification.
:::

### `name`

The name is a simple name or identifier to be used for this package in relation to any registry in which this package will be deposited.
Expand Down Expand Up @@ -166,24 +176,6 @@ An example of using the `licenses` property:
}]
```

### `profile`

A string identifying the profile of this descriptor as per the [profiles](https://specs.frictionlessdata.io/profiles/) specification.

Examples:

```json
{
"profile": "tabular-data-package"
}
```

```json
{
"profile": "http://example.com/my-profiles-json-schema.json"
}
```

### `title`

A `string` providing a title or one sentence description for this package
Expand Down
20 changes: 6 additions & 14 deletions content/docs/specifications/data-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,23 +128,15 @@ Or inline CSV:
Prior to release 1.0.0-beta.18 (Nov 17 2016) there was a `url` property distinct from `path`. In order to support backwards compatibility, implementors `MAY` want to automatically convert a `url` property to a `path` property and issue a warning.
:::

### `profile`
### `$schema`

A string identifying the profile of this descriptor as per the [profiles](https://specs.frictionlessdata.io/profiles/) specification.
A root level Data Resource descriptor `MAY` have a `$schema` property that `MUST` point to a profile as per [Profile](../glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

Examples:
The default value is `https://datapackage.org/profiles/1.0/dataresource.json` and the recommended value is `https://datapackage.org/profiles/2.0/dataresource.json`.

```json
{
"profile": "tabular-data-resource"
}
```

```json
{
"profile": "http://example.com/my-profiles-json-schema.json"
}
```
:::note[Backward Compatibility]
If the `$schema` property is not provided but a descriptor has the `profile` property a data consumer `MUST` validate the descriptor according to the [Profiles](https://specs.frictionlessdata.io/profiles/) specification.
:::

### `title`

Expand Down
108 changes: 103 additions & 5 deletions content/docs/specifications/extensions.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,115 @@
---
title: Extensions
sidebar:
hidden: true
order: 5
---

<table>
<tr>
<th>Authors</th>
<td>Rufus Pollock, Paul Walsh, Evgeny Karev, Peter Desmet</td>
<td>Rufus Pollock, Paul Walsh, Adam Kariv, Evgeny Karev, Peter Desmet</td>
</tr>
</table>

:::caution
This section is under development
:::
The Data Package Standard extensibility features for domain-specific needs.

## Language

The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt)

## Introduction

The Data Package Standard provides a rich set of metadata and data features for general applications. At the same time, the Data Package Standard at its core is domain-agnostic and does not provide any builtin means to describe metadata in specific knowledge areas such as biology or medicine.

A domain-specific extension is the way to enrich Data Package's metadata to meet specific needs of a knowledge domain. For example, there are some prominent Data Package extensions:

- [Camera Trap Data Package](https://camtrap-dp.tdwg.org/)
- [Fiscal Data Package](https://fiscal.datapackage.org)

## Extension

The Data Package Standard has a simple yet powerful extension mechanism based on the [Profile](../glossary/#profile) concept. An extension is, generally speaking, a project that provides one or more domain-specific profiles to the Data Package Standard specifications.

From user-perspective, a custom profile can be provided as a `$schema` property in a corresponding specification [Descriptor](../glossary/#descriptor). Having a profile instructs implementation to validate a descriptor using JSON Schema rules of the profile.

Usually, Data Package is the specification that is extended. As a container format, it is the most natural target for metadata enrichment. At the same time, technically any of the core specifications can be extended. For example, if you build a Table Schema catalog, it is possible to extend a Table Schema specification using the same approach as described below.

Note, that the Data Package Standard's extension system completely relies on the JSON Schema Standard without extending its builtin features in any way. It makes the system robust and provides rich tooling support such as [text editor validation](https://code.visualstudio.com/docs/languages/json#_mapping-in-the-json).

Combining modern JSON Schema features with an ability to provide profiles to any of the core Data Package Standard specification descriptors, allows to achieve almost any of metadata enrichment goals including but not limited to:

- Adding new domain-specific properties.
- Requiring existing properties to comply with certain requirements.
- Defining what resources are expected.
- Requiring resources to meet certain dialect or schema requirements.
- Combining existent profiles as a part of a high-level extension.
- Creating domain-specific dialect and schema catalogues.

## Example

For example, we will create a Spatial Data Package that requires a `geopoint` marker to be provided for each resource consisting a Data Package.

### Profile

First of all, we need to create a Data Package profile. Note that it includes a default data package profile as per the [specification requirement](../data-package/#schema):

```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Spatial Data Package Profile",
"type": "object",
"allOf": [
{ "$ref": "https://datapackage.org/profiles/2.0/datapackage.json" },
{ "$ref": "#/definitions/spatialMixin" }
],
"definitions": {
"spatialMixin": {
"type": "object",
"properties": {
"resources": {
"type": "array",
"item": {
"type": "object",
"required": ["geopoint"],
"properties": {
"geopoint": {
"type": "object",
"properties": {
"lon": { "type": "number" },
"lat": { "type": "number" },
"additionalProperties": false
}
}
}
}
}
}
}
}
}
```

### Descriptor

Consider that the profile above is published at `https://spatial.datapackage.org/profiles/1.0/datapackage.json`. In this case, a Data Package descriptor compatible to exemplar Spatial Data Package (v1) will look as below:

```json
{
"$schema": "https://spatial.datapackage.org/profiles/1.0/datapackage.json",
"title": "Spatial Data Package Descriptor",
"resources": [
{
"name": "expedition-1",
"path": "expedition-1.csv",
"geopoint": {
"lon": 90,
"lat": 90
}
}
]
}
```

### Software

Even though they are not aware of the extension, any Data Package software implementation will be validating a Spatial Data Package out of the box: both the domain-specific properties as well as the general Data Package properties. We do encourage extensions authors however to build on top of existing software to support domain-specific properties on the programming models level as well.
19 changes: 18 additions & 1 deletion content/docs/specifications/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,30 @@ The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `S

## Definitions

### Profile

A profile is a URL that `MUST`:

- resolves to a valid JSON Schema descriptor under the `draft-07` version
- be versioned and immutable i.e. once published under some version it cannot be changed

A profile is both used as a metadata version identifier and the location of a JSON Schema against which a descriptor having it as a root level `$schema` property `MUST` be valid and `MUST` be validated.

Similarly to [JSON Schema](https://json-schema.org/understanding-json-schema/reference/schema#schema), the `$schema` property has effect only on the root level of a descriptor. For example, if a Table Dialect is published as a file it can include a `$schema` property that affects its validation. If the same dialect is an object inlined into a Data Package descriptor, the dialect's `$schema` property `MUST` be ignored and the descriptor as whole `MUST` be validated against a root level `$schema` property provided by the package.

Data Package Standard employes profiles as a mechanism for creating extensions as per [Extensions](../extensions) specification.

:::note[Implementation Note]
It is recommended to cache profiles using their URL as a unique key.
:::

### Descriptor

The Data Package Standard uses a concept of a `descriptor` to represent metadata defined according to the core specefications such as Data Package or Table Schema.

On logical level, a descriptor is represented by a data structure. The data structure `MUST` be a JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, a descriptor is represented by a file. The file `MUST` contains a valid JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).
On physical level, a descriptor is represented by a file. The file `MUST` contain a valid JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a file containing a descriptor.

Expand Down
6 changes: 6 additions & 0 deletions content/docs/specifications/table-dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ id,name

General properties are format-agnostic. Usually, there are useful for defining dialects for delimiter-based and spreadsheet-based formats like CSV or Excel.

#### `$schema`

A root level Table Dialect descriptor `MAY` have a `$schema` property that `MUST` point to a profile as per [Profile](../glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/tabledialect.json` and the recommended value is `https://datapackage.org/profiles/2.0/tabledialect.json`.

#### `header`

A Table Dialect descriptor `MAY` have the `header` property that `MUST` be boolean with default value `true`. This property indicates whether the file includes a header row. If `true` the first row in the file `MUST` be interpreted as a header row, not data.
Expand Down
8 changes: 7 additions & 1 deletion content/docs/specifications/table-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,13 @@ A Table Schema descriptor `MAY` contain these standard properties:

A Table Schema descriptor `MUST` contain a property `fields`. `fields` `MUST` be an array where each entry in the array is a [field descriptor](#field) as defined below.

The way Table Schema `fields` are mapped onto the data source fields are defined by the `fieldsMatch` property. By default, the most strict approach is applied, i.e. fields in the data source `MUST` completely match the elements in the `fields` array, both in number and order. Using different options below, a data producer can relax requirements for the data source.
The way Table Schema `fields` are mapped onto the data source fields are defined by the `fieldsMatch` property. By default, the most strict approach is applied, i.e. fields in the data source `MUST` completely match the elements in the `fields` array, both in number and order. Using different options of the `fieldsMatch` property, a data producer can relax requirements for the data source.

#### `$schema`

A root level Table Schema descriptor `MAY` have a `$schema` property that `MUST` point to a profile as per [Profile](../glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.

The default value is `https://datapackage.org/profiles/1.0/tableschema.json` and the recommended value is `https://datapackage.org/profiles/2.0/tableschema.json`.

#### `fieldsMatch`

Expand Down
2 changes: 1 addition & 1 deletion profiles/data-package.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$schema": "http://json-schema.org/draft-07/schema#",
"$ref": "build/profiles/dictionary.json#/definitions/dataPackage"
}
2 changes: 1 addition & 1 deletion profiles/data-resource.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$schema": "http://json-schema.org/draft-07/schema#",
"$ref": "build/profiles/dictionary.json#/definitions/dataResource"
}
7 changes: 0 additions & 7 deletions profiles/dictionary/profile.yaml

This file was deleted.

4 changes: 0 additions & 4 deletions profiles/metadata-profile.json

This file was deleted.

2 changes: 1 addition & 1 deletion profiles/table-dialect.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$schema": "http://json-schema.org/draft-07/schema#",
"$ref": "build/profiles/dictionary.json#/definitions/tableDialect"
}
2 changes: 1 addition & 1 deletion profiles/table-schema.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$schema": "http://json-schema.org/draft-07/schema#",
"$ref": "build/profiles/dictionary.json#/definitions/tableSchema"
}

0 comments on commit a731ab4

Please sign in to comment.