diff --git a/content/docs/specifications/data-package.md b/content/docs/specifications/data-package.md index f6cfd6aa..36acb261 100644 --- a/content/docs/specifications/data-package.md +++ b/content/docs/specifications/data-package.md @@ -75,30 +75,16 @@ Several example data packages can be found in the [datasets organization on gith ## Descriptor +Data Package descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section. + +When available as a file a descriptor `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package). + The descriptor is the central file in a Data Package. It provides: - General metadata such as the package's title, license, publisher etc - A list of the data "resources" that make up the package including their location on disk or online and other relevant information (including, possibly, schema information about these data resources in a structured form) -A Data Package descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). When available as a file it `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package). - -The descriptor `MUST` contain a `resources` property describing the data resources. - -All other properties are considered `metadata` properties. The descriptor `MAY` contain any number of other `metadata` properties. The following sections provides a description of required and optional metadata properties for a Data Package descriptor. - -Adherence to the specification does not imply that additional, non-specified properties cannot be used: a descriptor `MAY` include any number of properties in additional to those described as required and optional properties. For example, if you were storing time series data and wanted to list the temporal coverage of the data in the Data Package you could add a property `temporal` (cf [Dublin Core](http://dublincore.org/documents/usageguide/qualifiers.shtml#temporal)): - -```json -"temporal": { - "name": "19th Century", - "start": "1800-01-01", - "end": "1899-12-31" -} -``` - -This flexibility enables specific communities to extend Data Packages as appropriate for the data they manage. As an example, the [Tabular Data Package](https://specs.frictionlessdata.io/tabular-data-package/) specification extends Data Package to the case where all the data is tabular and stored in CSV. - -Here is an illustrative example of a datapackage JSON file: +An example of a Data Package descriptor: ```json { @@ -114,6 +100,10 @@ Here is an illustrative example of a datapackage JSON file: } ``` +:::note[File Names] +A file containing a Data Package descriptor `MAY` have other name rather than `datapackage.json` as an internal part of some project or system if supported by corresponding implementations. A descriptor `SHOULD NOT` be externally published under any other name than `datapackage.json`. +::: + ## Properties A Data Package descriptor `MUST` have `resoures` property and `SHOULD` have `name`, `id`, `licenses`, and `profile` properties. diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index 1bd2ba5c..614b0e6f 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -21,40 +21,11 @@ A simple format to describe and package a single data resource such as a individ The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) -## Example - -A minimal Data Resource looks as follows: - -With data accessible via the local filesystem. - -```json -{ - "name": "resource-name", - "path": "resource-path.csv" -} -``` - -With data accessible via http. - -```json -{ - "name": "resource-name", - "path": "http://example.com/resource-path.csv" -} -``` +## Descriptor -A minimal Data Resource pointing to some inline data looks as follows. +Data Resource descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section. -```json -{ - "name": "resource-name", - "data": { - "resource-name-data": [{ "a": 1, "b": 2 }] - } -} -``` - -A comprehensive Data Resource example with all required, recommended and optional properties looks as follows. +An example of a Data Resource descriptor: ```json { @@ -73,10 +44,6 @@ A comprehensive Data Resource example with all required, recommended and optiona } ``` -## Descriptor - -A Data Resource descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). - ## Properties Standard properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties. diff --git a/content/docs/specifications/glossary.md b/content/docs/specifications/glossary.md index f8479900..7e64d6a0 100644 --- a/content/docs/specifications/glossary.md +++ b/content/docs/specifications/glossary.md @@ -19,6 +19,38 @@ The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `S ## Definitions +### Descriptor + +The Data Package Standard uses a concept of a `descriptor` to represent metadata defined according to the core specefications such as Data Package or Table Schema. + +On logical level, a descriptor is represented by a data structure. The data structure `MUST` be a JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +On physical level, a descriptor is represented by a file. The file `MUST` contains a valid JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a file containing a descriptor. + +:::note[File Formats] +A descriptor `MAY` be serialized using alternative formats like YAML or TOML as an internal part of some project or system if supported by corresponding implementations. A descriptor `SHOULD NOT` be externally published in any other format rather than JSON. +::: + +#### Custom Properties + +The Data Package specifications define a set of standard properties to be used and allows custom properties to be added. It is `RECOMMENDED` to use `namespace:property` naming convention for custom properties. + +Adherence to a specification does not imply that additional, non-specified properties cannot be used: a descriptor `MAY` include any number of properties in additional to those described as required and optional properties. For example, if you were storing time series data and wanted to list the temporal coverage of the data in the Data Package you could add a property `temporal` (cf [Dublin Core](http://dublincore.org/documents/usageguide/qualifiers.shtml#temporal)): + +```json +{ + "dc:temporal": { + "name": "19th Century", + "start": "1800-01-01", + "end": "1899-12-31" + } +} +``` + +This flexibility enables specific communities to extend metadata as appropriate for the data they manage. As an example, the [Tabular Data Package](https://specs.frictionlessdata.io/tabular-data-package/) specification extends Data Package to the case where all the data is tabular and stored in CSV. + ### URL or Path A `URL or Path` is a `string` with the following additional constraints: diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index e6bce3fb..d84f0e78 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -33,15 +33,17 @@ Table Dialect supersedes [CSV Dialect](https://specs.frictionlessdata.io/csv-dia ## Descriptor -On logical level, Table Dialect descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). +Table Dialect descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section. -On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. +An example of a Table Dialect descriptor: -The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. - -This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. - -This specification defines a set of standardized properties to be used and allows custom properties to be added. It is `RECOMMENDED` to use `namespace:property` naming convention for custom properties. +```json +{ + "header": false, + "delimiter": ";", + "quoteChar": "'" +} +``` ## Properties diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index 7f7de2b6..397286a2 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -71,11 +71,9 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d ## Descriptor -A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). +Table Schema descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section. -The descriptor `MAY` have the additional properties set out below and `MAY` contain any number of other properties not defined in this specification. - -The following is an illustration of this structure: +An example of a Table Schema descriptor: ```json { @@ -92,7 +90,7 @@ The following is an illustration of this structure: ... ], "missingValues": [ ... ], - "primaryKey": [ ... ] + "primaryKey": [ ... ], "foreignKeys": [... ] } ```