From bd163e8047ef69635b2372fa1fea29760cdfdf74 Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 28 Mar 2024 08:42:20 +0000 Subject: [PATCH] Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema (#39) * Bootstrapped field order section * Updted the spec * Updated the profile * Fixed styling * Rebased on two properties * Removed partial * Added articles * Update content/docs/specifications/table-schema.md Co-authored-by: Peter Desmet * Fixed `exactFields` * Revert "Fixed `exactFields`" This reverts commit f04b7ede3a8acd253cdb0a84204116118cd701ac. * Revert "Revert "Fixed `exactFields`"" This reverts commit f6706140be1f2dc8ba39f8e78db10850e9bff3b4. * Reverce defaults * Updated the profile * Fixed typo * Rebased to `schema.fieldsMatch` * Updated wording * Update content/docs/specifications/table-schema.md Co-authored-by: Peter Desmet * Update content/docs/specifications/table-schema.md Co-authored-by: Peter Desmet * Update content/docs/specifications/table-schema.md Co-authored-by: Peter Desmet --------- Co-authored-by: Peter Desmet Co-authored-by: Peter Desmet --- content/docs/specifications/table-schema.md | 30 +++++++++++++++++---- profiles/dictionary/schema.yaml | 13 +++++++++ 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index 7d61243d..272be5cb 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -69,9 +69,7 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). -It `MUST` contain a property `fields`. `fields` `MUST` be an array where each entry in the array is a field descriptor (as defined below). The order of elements in `fields` array `SHOULD` be the order of fields in the CSV file. The number of elements in `fields` array `SHOULD` be the same as the number of fields in the CSV file. - -The descriptor `MAY` have the additional properties set out below and `MAY` contain any number of other properties (not defined in this specification). +The descriptor `MAY` have the additional properties set out below and `MAY` contain any number of other properties not defined in this specification. The following is an illustration of this structure: @@ -101,7 +99,25 @@ The following is an illustration of this structure: } ``` -## Field Descriptors +## Properties + +### `fields` + +A Table Schema descriptor `MUST` contain a property `fields`. `fields` `MUST` be an array where each entry in the array is a field descriptor as defined below. + +The way Table Schema `fields` are mapped onto the data source fields are defined by the `fieldsMatch` property. By default, the most strict approach is applied, i.e. fields in the data source `MUST` completely match the elements in the `fields` array, both in number and order. Using different options below, a data producer can relax requirements for the data source. + +### `fieldsMatch` + +A Table Schema descriptor `MAY` contain a property `fieldsMatch` that `MUST` be a string with the following possible values and the `exact` value by default: + +- **exact** (default): The data source `MUST` have exactly the same fields as defined in the `fields` array. Fields `MUST` be mapped by their order. +- **equal**: The data source `MUST` have exactly the same fields as defined in the `fields` array. Fields `MUST` be mapped by their names. +- **subset**: The data source `MUST` have all the fields defined in the `fields` array, but `MAY` have more. Fields `MUST` be mapped by their names. +- **superset**: The data source `MUST` only have fields defined in the `fields` array, but `MAY` have fewer. Fields `MUST` be mapped by their names. +- **partial**: The data source `MUST` have at least one field defined in the `fields` array. Fields `MUST` be mapped by their names. + +## Field Properties A field descriptor `MUST` be a JSON `object` that describes a single field. The descriptor provides additional human-readable documentation for a field, as @@ -128,7 +144,11 @@ The field descriptor `object` `MAY` contain any number of other properties. Some ### `name` -The field descriptor `MUST` contain a `name` property. This property `SHOULD` correspond to the name of field/column in the data file (if it has a name). As such it `SHOULD` be unique (though it is possible, but very bad practice, for the data file to have multiple columns with the same name). `name` `SHOULD NOT` be considered case sensitive in determining uniqueness. However, since it corresponds to the name of the field in the data file it may be important to preserve case. +The field descriptor `MUST` contain a `name` property and it `MUST` be unique amongst other field names in this Table Schema. This property `SHOULD` correspond to the name of a column in the data file if it has a name. + +:::note[Backward Compatibility] +If the `name` properties are not unique amongst a Table Schema a data consumer `MUST NOT` interpret it as an invalid descriptor as duplicate `name` properties were allowed in the `v1.0` of the specification. +::: ### `title` diff --git a/profiles/dictionary/schema.yaml b/profiles/dictionary/schema.yaml index 828bab75..ab49784a 100644 --- a/profiles/dictionary/schema.yaml +++ b/profiles/dictionary/schema.yaml @@ -40,6 +40,8 @@ tableSchema: } ] } + fieldsMatch: + "$ref": "#/definitions/tableSchemaFieldsMatch" primaryKey: "$ref": "#/definitions/tableSchemaPrimaryKey" uniqueKeys: @@ -116,6 +118,17 @@ tableSchemaField: - "$ref": "#/definitions/tableSchemaFieldArray" - "$ref": "#/definitions/tableSchemaFieldDuration" - "$ref": "#/definitions/tableSchemaFieldAny" +tableSchemaFieldsMatch: + type: array + item: + type: string + enum: + - exact + - equal + - subset + - superset + - partial + default: exact tableSchemaPrimaryKey: oneOf: - type: array