Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Commit

Permalink
Separate concerns between metadata enrichment and data structure in D…
Browse files Browse the repository at this point in the history
…ata Resource using `resource.type`. Remove Tabular Data Resource from v2 website (#51)

* Added `resource.type`

* Removed `resource.profile`

* Added Tabular section

* Removed metadata-profile profile

* Updated profile

* Removed Tabular Data Resource extension

* Fixed build
  • Loading branch information
roll authored Apr 12, 2024
1 parent 4802ed5 commit d6afa4c
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 121 deletions.
14 changes: 0 additions & 14 deletions content/docs/extensions/tabular-data-resource.md

This file was deleted.

112 changes: 92 additions & 20 deletions content/docs/specifications/data-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,26 +48,30 @@ An example of a Data Resource descriptor:

Standard properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties.

### `name` [required]
### General

The properties below are applicable to any Data Resource.

#### `name` [required]

A resource `MUST` contain a `name` property. The name is a simple name or identifier to be used for this resource.

- It `MUST` be unique amongst all resources in this data package.
- It `SHOULD` be human-readable and consist only of lowercase alphanumeric characters plus `.`, `-` and `\_`.
- It would be usual for the name to correspond to the file name (minus the extension) of the data file the resource describes.

### `path` or `data` [required]
#### `path` or `data` [required]

A resource `MUST` contain a property describing the location of the data associated to the resource. The location of resource data `MUST` be specified by the presence of one (and only one) of these two properties:

- `path`: for data in files located online or locally on disk.
- `data`: for data inline in the descriptor itself.

#### Single File
##### Single File

If a resource have only a single file then `path` `MUST` be a string that a "url-or-path" as defined in the [URL of Path](../glossary/#url-or-path) definition.

#### Multiple Files
##### Multiple Files

Usually, a resource will have only a single file associated to it. However, sometimes it can be convenient to have a single resource whose data is split across multiple files -- perhaps the data is large and having it in one file would be inconvenient.

Expand All @@ -85,7 +89,7 @@ It is NOT permitted to mix fully qualified URLs and relative paths in a `path` a
All files in the array `MUST` be similar in terms of structure, format etc. Implementors `MUST` be able to concatenate together the files in the simplest way and treat the result as one large file. For tabular data there is the issue of header rows. See the [Tabular Data Package spec](https://specs.frictionlessdata.io/tabular-data-package/) for more on this.
:::

#### Inline Data
##### Inline Data

Resource data rather than being stored in external files can be shipped `inline` on a Resource using the `data` property.

Expand Down Expand Up @@ -128,6 +132,18 @@ Or inline CSV:
Prior to release 1.0.0-beta.18 (Nov 17 2016) there was a `url` property distinct from `path`. In order to support backwards compatibility, implementors `MAY` want to automatically convert a `url` property to a `path` property and issue a warning.
:::

#### `type`

A Data Resource descriptor `MAY` contain a property `type` that `MUST` be a string with the following possible values:

- `table`: indicates that the resource is tabular as per [Tabular Data](../glossary/#tabular-data) definition. Please read more about [Tabular Resource](#tabular) properties.

If property `type` is not provided, the resource is considered to be a non-specific file. An implementation `MAY` provide some additional interfaces, for example, tabular, to non-specific files if `type` can be detected from the data source or format.

:::note[Backward Compatibility]
If a resource has `profile` property that equals to `tabular-data-resource` or `https://specs.frictionlessdata.io/schemas/tabular-data-resource.json`, an implementation `MUST` treat it as `type` property were set to `table`
:::

### `$schema`

A root level Data Resource descriptor `MAY` have a `$schema` property that `MUST` point to a profile as per [Profile](../glossary/#profile) definition that `MUST` include all the metadata constraints required by this specification.
Expand All @@ -136,33 +152,32 @@ The default value is `https://datapackage.org/profiles/1.0/dataresource.json` an

:::note[Backward Compatibility]
If the `$schema` property is not provided but a descriptor has the `profile` property a data consumer `MUST` validate the descriptor according to the [Profiles](https://specs.frictionlessdata.io/profiles/) specification.
:::

### `title`
#### `title`

Title or label for the resource.

### `description`
#### `description`

Description of the resource.

### `format`
#### `format`

Would be expected to be the standard file extension for this type of resource.For example, `csv`, `xls`, `json` etc.

### `mediatype`
#### `mediatype`

Te mediatype/mimetype of the resource e.g. "text/csv", or "application/vnd.ms-excel". Mediatypes are maintained by the Internet Assigned Numbers Authority (IANA) in a [media type registry](https://www.iana.org/assignments/media-types/media-types.xhtml).

### `encoding`
#### `encoding`

The character encoding of resource's data file (only applicable for textual files). The value `SHOULD` be one of the "Preferred MIME Names" for [a character encoding registered with IANA](http://www.iana.org/assignments/character-sets/character-sets.xhtml). If no value for this property is specified then the encoding `SHOULD` be detected on the implementation level. It is `RECOMMENDED` to use UTF-8 (without BOM) as a default encoding for textual files.

### `bytes`
#### `bytes`

Size of the file in bytes.

### `hash`
#### `hash`

The MD5 hash for this resource. Other algorithms can be indicated by prefixing the hash's value with the algorithm name in lower-case. For example:

Expand All @@ -172,20 +187,77 @@ The MD5 hash for this resource. Other algorithms can be indicated by prefixing t
}
```

### `sources`
#### `sources`

List of data sources as for [Data Package](../data-package/#sources).

### `licenses`
#### `licenses`

List of licenses as for [Data Package](../data-package/#licenses). If not specified the resource inherits from the data package.

### `schema`
### Tabular

The properties below are applicable to any Tabular Data Resource.

#### `data`

If the `data` property is used for providing data for a Tabular Data Resource than it `MUST` be an `array` where each item in the array `MUST` be either:

- an array where each entry in the array is the value for that cell in the table OR
- an object where each key corresponds to the header for that row and the value corresponds to the cell value for that row for that header.

Array of arrays example:

```json
[
["A", "B", "C"],
[1, 2, 3],
[4, 5, 6]
]
```

Array of objects example:

```json
[
{ "A": 1, "B": 2, "C": 3 },
{ "A": 4, "B": 5, "C": 6 }
]
```

A Data Resource `MAY` have a `schema` property to describe the schema of the resource data.
#### `dialect`

The value for the `schema` property on a `resource` MUST be an `object` representing the schema OR a `string` that identifies the location of the schema.
A Tabular Data Resource `MAY` have a `dialect` property to describe a tabular dialect of the resource data. If provided, the `dialect` property `MUST` be a [Table Dialect](../table-dialect) descriptor in a form of an object or [URL-or-Path](../glossary/#url-or-path).

If a `string` it must be a [URL or Path](../glossary/#url-or-path), that is a fully qualified http URL or a relative POSIX path. The file at the location specified by this [URL or Path](../glossary/#url-or-path) string `MUST` be a JSON document containing the schema.
An example of a resource with a dialect:

NOTE: the Data Package specification places no restrictions on the form of the schema Object. This flexibility enables specific communities to define schemas appropriate for the data they manage. As an example, the [Tabular Data Package](https://specs.frictionlessdata.io/tabular-data-package/) specification requires the schema to conform to [Table Schema](../table-schema/).
```json
{
"name": "table",
"type": "table",
"path": "table.csv",
"dialect": {
"delimiter": ";"
}
}
```

#### `schema`

A Tabular Data Resource `SHOULD` have a `schema` property to describe a tabular schema of the resource data. If provided, the `schema` property `MUST` be a [Table Schema](../table-schema) descriptor in a form of an object or [URL-or-Path](../glossary/#url-or-path).

An example of a resource with a schema:

```json
{
"name": "table",
"type": "table",
"path": "table.csv",
"schema": {
"fields": [
{ "name": "id", "type": "integer" },
{ "name": "name", "type": "string" }
]
}
}
```
12 changes: 1 addition & 11 deletions content/docs/standard/extensions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,7 @@ import { LinkCard, CardGrid } from "@astrojs/starlight/components"
We want to help as much domain-specific Data Package extensions as possible. If you have one in mind or already started working, feel free to share opening a new issue or pull request.
:::

One of the key strengths of the Data Package Standard lies in its extensibility. While the standard provides a solid foundation for organizing and describing data, it also recognizes that diverse datasets have unique requirements. Data practitioners can extend the standard by incorporating custom metadata, validation rules, or specific constraints to suit their data's peculiarities. Below is the list of registered Data Package extensions:

## General Use

<CardGrid>
<LinkCard
title="Tabular Data Resource"
description="A resource type within tabular data packages, typically containing structured data tables."
href="../../extensions/tabular-data-resource"
/>
</CardGrid>
One of the key strengths of the Data Package Standard lies in its extensibility. While the standard provides a solid foundation for organizing and describing data, it also recognizes that diverse datasets have unique requirements. Data practitioners can extend the standard by incorporating custom metadata, validation rules, or specific constraints to suit their data's peculiarities. Below is the list of well-known Data Package extensions:

## Domain Specific

Expand Down
2 changes: 1 addition & 1 deletion profiles/dictionary/package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ tabularDataResources:
type: array
minItems: 1
items:
"$ref": "#/definitions/tabularDataResource"
"$ref": "#/definitions/dataResource"
examples:
- |
{
Expand Down
85 changes: 10 additions & 75 deletions profiles/dictionary/resource.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ dataResource:
data:
"$ref": "#/definitions/data"
propertyOrder: 230
schema:
"$ref": "#/definitions/anySchema"
propertyOrder: 40
type:
"$ref": "#/definitions/resourceType"
propertyOrder: 235
title:
"$ref": "#/definitions/title"
propertyOrder: 50
Expand Down Expand Up @@ -66,81 +66,12 @@ dataResource:
propertyOrder: 120
options:
hidden: true
tabularDataResource:
title: Tabular Data Resource
description: A Tabular Data Resource.
type: object
oneOf:
- required:
- name
- data
- schema
- profile
- required:
- name
- path
- schema
- profile
properties:
profile:
"$ref": "#/definitions/profile"
enum: ["tabular-data-resource"]
propertyOrder: 10
name:
"$ref": "#/definitions/name"
propertyOrder: 20
path:
"$ref": "#/definitions/resourcePath"
propertyOrder: 30
data:
"$ref": "#/definitions/tabularData"
propertyOrder: 230
dialect:
"$ref": "#/definitions/tableDialect"
propertyOrder: 130
schema:
"$ref": "#/definitions/tableSchema"
propertyOrder: 40
title:
"$ref": "#/definitions/title"
propertyOrder: 50
description:
"$ref": "#/definitions/description"
propertyOrder: 60
format: textarea
homepage:
"$ref": "#/definitions/homepage"
propertyOrder: 70
sources:
"$ref": "#/definitions/sources"
propertyOrder: 140
options:
hidden: true
licenses:
"$ref": "#/definitions/licenses"
description: The license(s) under which the resource is published.
propertyOrder: 150
options:
hidden: true
dialect:
"$ref": "#/definitions/tableDialect"
propertyOrder: 50
format:
"$ref": "#/definitions/format"
propertyOrder: 80
mediatype:
"$ref": "#/definitions/mediatype"
propertyOrder: 90
encoding:
"$ref": "#/definitions/encoding"
propertyOrder: 100
bytes:
"$ref": "#/definitions/bytes"
propertyOrder: 110
options:
hidden: true
hash:
"$ref": "#/definitions/hash"
propertyOrder: 120
options:
hidden: true
pathArray:
type: array
minItems: 1
Expand Down Expand Up @@ -180,6 +111,10 @@ resourcePath:
{
"path": "http://example.com/file.csv"
}
resourceType:
type: string
enum:
- table
format:
title: Format
description: The file format of this resource.
Expand Down

0 comments on commit d6afa4c

Please sign in to comment.