Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

[Draft] Clarify serialization and discoverability #40

Closed
wants to merge 13 commits into from
12 changes: 8 additions & 4 deletions content/docs/specifications/data-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,15 +81,19 @@ Several example data packages can be found in the [datasets organization on gith

### Descriptor

On logical level, Data Package descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Data Package descriptor is represented by a file. A data producer `MAY` use any suitable serialization format and `SHOULD` name the file `datapackage.json`. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

JSON is the serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Package. It is good practice and a common convention to name the file `datapackage.json`.

The descriptor is the central file in a Data Package. It provides:

- General metadata such as the package's title, license, publisher etc
- A list of the data "resources" that make up the package including their location on disk or online and other relevant information (including, possibly, schema information about these data resources in a structured form)

A Data Package descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]). When available as a file it `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package).

[RFC 4627]: http://www.ietf.org/rfc/rfc4627.txt

The descriptor `MUST` contain a `resources` property describing the data resources.

All other properties are considered `metadata` properties. The descriptor `MAY` contain any number of other `metadata` properties. The following sections provides a description of required and optional metadata properties for a Data Package descriptor.
Expand Down
10 changes: 8 additions & 2 deletions content/docs/specifications/data-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,15 @@ A comprehensive Data Resource example with all required, recommended and optiona
}
```

### Descriptor
## Descriptor

A Data Resource descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]).
On logical level, Data Resource descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Data Resource descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

JSON is the serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Resource.

Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties.

Expand Down
10 changes: 9 additions & 1 deletion content/docs/specifications/table-dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,15 @@ CSV Dialect is useful for programmes which might have to deal with multiple dial

Some related work can be found in [this comparison of csv dialect support](https://docs.google.com/spreadsheet/ccc?key=0AmU3V2vcPKrIdEhoU1NQSWtoQmJwcUNCelJtdkx2bFE&usp=sharing), this [example of similar JSON format](http://panda.readthedocs.org/en/latest/api.html#data-uploads), and in Python's [PEP 305](http://www.python.org/dev/peps/pep-0305/).

## Specification
## Descriptor

On logical level, Table Dialect descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

JSON is the serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Dialect.

A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties:

Expand Down
12 changes: 11 additions & 1 deletion content/docs/specifications/table-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,17 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d

## Descriptor

A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)).
On logical level, Table Schema descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Table Schema descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

JSON is the serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Schema.

## Metadata

### Fields

The descriptor `MAY` have the additional properties set out below and `MAY` contain any number of other properties not defined in this specification.

Expand Down