-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify dset/attr builders based on sidecar JSON #677
base: dev
Are you sure you want to change the base?
Changes from 22 commits
9e4ba60
1f53919
dafc650
de5fefe
3f1f8f2
036fa1e
b4b5419
151c69d
32d1397
933ef40
393e5b3
2fda06d
28c6893
6da168d
618ab1c
393ffdf
729e989
ecd244d
168f4a9
1c57573
62ed248
7078ca1
9faf7a2
827d61d
ef22dc5
2bb7185
fee5245
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,154 @@ | ||||||||||||||||||||||
.. _modifying_with_sidecar: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Modifying an HDMF File with a Sidecar JSON File | ||||||||||||||||||||||
=============================================== | ||||||||||||||||||||||
|
||||||||||||||||||||||
Users may want to update part of an HDMF file without rewriting the entire file. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think it would be useful to elaborate a little bit on this to clarify the intent and scope of the sidecar file, i.e., this is for small updates and corrections only. |
||||||||||||||||||||||
To do so, HDMF supports the use of a "sidecar" JSON file that lives adjacent to the HDMF file on disk and | ||||||||||||||||||||||
specifies modifications to the HDMF file. Only a limited set of modifications are supported; for example, users can | ||||||||||||||||||||||
delete a dataset or attribute but cannot create a new dataset or attribute. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I'll make the change. For now, I have not allowed hiding of groups because the use case is unclear. But it is technically not very different from hiding of datasets. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a main use-case for hiding groups would instances of a data_type, e.g., to hide a TimeSeries that for some reason contains bad data. If it's trivial, then I think allowing to hide groups is something we could allow, but if it adds a lot of complexity then I would hold off until a specific need arises. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think Does |
||||||||||||||||||||||
When HDMF reads an HDMF file, if the corresponding sidecar JSON file exists, it is | ||||||||||||||||||||||
automatically read and the modifications that it specifies are automatically applied. | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. note:: | ||||||||||||||||||||||
|
||||||||||||||||||||||
This default behavior can be changed such that the corresponding sidecar JSON file is ignored when the HDMF file | ||||||||||||||||||||||
is read by passing ``load_sidecar=False`` to ``HDMFIO.read()`` on the ``HDMFIO`` object used to read the HDMF file. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Allowed modifications | ||||||||||||||||||||||
--------------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
Only the following modifications to an HDMF file are supported in the sidecar JSON file: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- Replace the values of a dataset or attribute with a scalar or 1-D array | ||||||||||||||||||||||
- Delete a dataset or attribute | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. note:: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Replacing the values of a dataset or attribute with a very large 1-D array using the sidecar JSON file may not | ||||||||||||||||||||||
be efficient and is discouraged. Users should instead consider rewriting the HDMF file with the | ||||||||||||||||||||||
updated values. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specification for the sidecar JSON file | ||||||||||||||||||||||
--------------------------------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
The sidecar JSON file can be validated using the ``sidecar.schema.json`` JSON schema file | ||||||||||||||||||||||
located at the root of the HDMF repository. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are sidecar files automatically validated by the validator as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are sidecar files automatically validated by the validator as well? |
||||||||||||||||||||||
|
||||||||||||||||||||||
The sidecar JSON file must contain the following top-level keys: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- ``"description"``: A free-form string describing the modifications specified in this file. | ||||||||||||||||||||||
- ``"author"``: A list of free-form strings containing the names of the people who created this file. | ||||||||||||||||||||||
- ``"contact"``: A list of email addresses for the people who created this file. Each author listed in the "author" key | ||||||||||||||||||||||
*should* have a corresponding email address. | ||||||||||||||||||||||
- ``"operations"``: A list of operations to perform on the data in the file, as specified below. | ||||||||||||||||||||||
- ``"schema_version"``: The version of the sidecar JSON schema that the file conforms to, e.g., "0.1.0". | ||||||||||||||||||||||
View the current version of this file here: | ||||||||||||||||||||||
`sidecar.schema.json <https://github.com/hdmf-dev/hdmf/blob/dev/sidecar.schema.json>`_ | ||||||||||||||||||||||
|
||||||||||||||||||||||
Here is an example sidecar JSON file: | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. code:: javascript | ||||||||||||||||||||||
|
||||||||||||||||||||||
{ | ||||||||||||||||||||||
"description": "Summary of changes", | ||||||||||||||||||||||
"author": [ | ||||||||||||||||||||||
"The NWB Team" | ||||||||||||||||||||||
], | ||||||||||||||||||||||
"contact": [ | ||||||||||||||||||||||
"[email protected]" | ||||||||||||||||||||||
], | ||||||||||||||||||||||
"operations": [ | ||||||||||||||||||||||
{ | ||||||||||||||||||||||
"type": "replace", | ||||||||||||||||||||||
"description": "change foo1/my_data data from [1, 2, 3] to [4, 5] (int8)", | ||||||||||||||||||||||
"object_id": "e0449bb5-2b53-48c1-b04e-85a9a4631655", | ||||||||||||||||||||||
"relative_path": "my_data", | ||||||||||||||||||||||
"value": [ | ||||||||||||||||||||||
4, | ||||||||||||||||||||||
5 | ||||||||||||||||||||||
], | ||||||||||||||||||||||
"dtype": "int8" | ||||||||||||||||||||||
}, | ||||||||||||||||||||||
{ | ||||||||||||||||||||||
"type": "delete", | ||||||||||||||||||||||
"description": "delete foo1/foo_holder/my_sub_data/attr6", | ||||||||||||||||||||||
"object_id": "993fef27-680c-457a-af4d-b1d2725fcca9", | ||||||||||||||||||||||
"relative_path": "foo_holder/my_sub_data/attr6" | ||||||||||||||||||||||
} | ||||||||||||||||||||||
], | ||||||||||||||||||||||
"schema_version": "0.1.0" | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specification for operations | ||||||||||||||||||||||
---------------------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
All operations are required to have the following keys: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- ``"type"``: The type of modification to perform. Only "replace" and "delete" are supported currently. | ||||||||||||||||||||||
- ``"description"``: A description of the specified modification. | ||||||||||||||||||||||
- ``"object_id"``: The object ID (UUID) of the data type that is closest in the file hierarchy to the | ||||||||||||||||||||||
field being modified. | ||||||||||||||||||||||
- ``"relative_path"``: The relative path from the data type with the given object ID to the field being modified. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Operations can result in invalid files, i.e., files that do not conform to the specification. It is strongly | ||||||||||||||||||||||
recommended that the file is validated against the schema after loading the sidecar JSON. In some cases, the | ||||||||||||||||||||||
file cannot be read because the file is invalid. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
rly marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
||||||||||||||||||||||
Replacing values of a dataset/attribute with a scalar or 1-D array | ||||||||||||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specify ``"type": "replace"`` to replace the values of a dataset/attribute from the associated HDMF file | ||||||||||||||||||||||
as specified by the ``object_id`` and ``relative_path``. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The operation specification must have the following keys: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- ``"value"``: The new value for the dataset/attribute. Only scalar and 1-dimensional arrays can be | ||||||||||||||||||||||
specified as a replacement value. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The operation specification may also have the following keys: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- ``"dtype"``: String representing the dtype of the new value. If this key is not present, then the dtype of the | ||||||||||||||||||||||
existing value for the dataset/attribute is used. Allowed dtypes are listed in the | ||||||||||||||||||||||
`HDMF schema language docs for dtype <https://hdmf-schema-language.readthedocs.io/en/latest/description.html#dtype>`_. | ||||||||||||||||||||||
|
||||||||||||||||||||||
In the example sidecar JSON file above, the first operation specifies that the value of dataset "my_data" in | ||||||||||||||||||||||
group "foo1", which has the specified object ID, should be replaced with the 1-D array [4, 5] (dtype: int8). | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. note:: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Replacing the values of datasets or attributes with object references or a compound data type is not yet supported. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Deleting a dataset/attribute | ||||||||||||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specify ``"type": "delete"`` to delete (ignore) a dataset/attribute from the associated HDMF file | ||||||||||||||||||||||
as specified by the ``object_id`` and ``relative_path``. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The operation specification does not use any additional keys. | ||||||||||||||||||||||
|
||||||||||||||||||||||
In the example sidecar JSON file above, the second operation specifies that attribute "attr6" | ||||||||||||||||||||||
at relative path "foo_holder/my_sub_data/attr6" from group "foo1", which has the specified object ID, | ||||||||||||||||||||||
should be deleted. | ||||||||||||||||||||||
If "attr6" is a required attribute, this is likely to result in an invalid file that cannot be read by HDMF. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Future changes | ||||||||||||||||||||||
-------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
The HDMF team is considering supporting additional operations and expanding support for current operations | ||||||||||||||||||||||
specified in the sidecar JSON file, such as: | ||||||||||||||||||||||
|
||||||||||||||||||||||
- Add rows to a ``DynamicTable`` (column-based) | ||||||||||||||||||||||
- Add rows to a ``Table`` (row-based) | ||||||||||||||||||||||
- Add a new group | ||||||||||||||||||||||
- Add a new dataset | ||||||||||||||||||||||
- Add a new attribute | ||||||||||||||||||||||
- Add a new link | ||||||||||||||||||||||
- Replace a dataset or attribute with object references | ||||||||||||||||||||||
- Replace a dataset or attribute with a compound data type | ||||||||||||||||||||||
- Replace selected slices of a dataset or attribute | ||||||||||||||||||||||
- Delete a group | ||||||||||||||||||||||
- Delete a link | ||||||||||||||||||||||
|
||||||||||||||||||||||
Please provide feedback on which operations are useful to you for HDMF to support in this | ||||||||||||||||||||||
`issue ticket <https://github.com/hdmf-dev/hdmf/issues/676>`_. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,194 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"$id": "sidecar.schema.json", | ||
"title": "Schema for the sidecar JSON file", | ||
"description": "A schema for validating HDMF sidecar JSON files", | ||
"version": "0.1.0", | ||
"type": "object", | ||
"additionalProperties": false, | ||
"required": [ | ||
"description", | ||
"author", | ||
"contact", | ||
"operations", | ||
"schema_version" | ||
], | ||
"properties": { | ||
"description": { | ||
"description": "A free-form string describing the modifications specified in this file.", | ||
"type": "string" | ||
}, | ||
"author": { | ||
"description": "A list of free-form strings containing the names of the people who created this file.", | ||
"type": "array", | ||
"items": {"type": "string"} | ||
}, | ||
"contact": { | ||
"description": "A list of email addresses for the people who created this file. Each author listed in the 'author' key *should* have a corresponding email address.", | ||
"type": "array", | ||
"items": { | ||
"type": "string", | ||
"pattern": "^.*@.*$" | ||
} | ||
}, | ||
"operations": { | ||
"description": "A list of operations to perform on the data in the file.", | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"additionalProperties": false, | ||
"required": [ | ||
"type", | ||
"description", | ||
"object_id", | ||
"relative_path" | ||
], | ||
"properties": { | ||
"type": { | ||
"description": "The type of modification to perform.", | ||
"member_region": { | ||
"type": ["replace", "delete"] | ||
} | ||
}, | ||
"description": { | ||
"description": "A description of the specified modification.", | ||
"type": "string" | ||
}, | ||
"object_id": { | ||
"description": "The object ID (UUID) of the data type that is closest in the file hierarchy to the field being modified. Must be in the UUID-4 format with hyphens.", | ||
"type": "string", | ||
"pattern": "^[0-9a-f]{8}\\-[0-9a-f]{4}\\-4[0-9a-f]{3}\\-[89ab][0-9a-f]{3}\\-[0-9a-f]{12}$" | ||
}, | ||
"relative_path": { | ||
"description": " The relative path from the data type with the given object ID to the field being modified.", | ||
"type": "string" | ||
}, | ||
"element_type": { | ||
"anyOf": [ | ||
{ | ||
"type": "string", | ||
"enum": [ | ||
"group", | ||
"dataset", | ||
"attribute" | ||
] | ||
} | ||
] | ||
}, | ||
"value": { | ||
"description": "The new value for the dataset/attribute.", | ||
"member_region": { | ||
"type": ["array", "string", "number", "boolean", "null"] | ||
} | ||
}, | ||
"dtype": {"$ref": "#/definitions/dtype"} | ||
}, | ||
"allOf": [ | ||
{ | ||
"description": "if type==replace, then value is required.", | ||
"if": { | ||
"properties": { "type": { "const": "replace" } } | ||
}, | ||
"then": { | ||
"required": [ "value" ] | ||
} | ||
}, | ||
{ | ||
"description": "if type==delete, then value and dtype are not allowed.", | ||
"if": { | ||
"properties": { "type": { "const": "delete" } } | ||
}, | ||
"then": { | ||
"properties": { | ||
"value": false, | ||
"dtype": false | ||
} | ||
} | ||
}, | ||
{ | ||
"description": "if type==create, then element_type is required.", | ||
"if": { | ||
"properties": { "type": { "const": "create" } } | ||
}, | ||
"then": { | ||
"required": [ "element_type" ] | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
"schema_version": { | ||
"description": "The version of the sidecar JSON schema that the file conforms to. Must confirm to Semantic Versioning v2.0.", | ||
"type": "string", | ||
"pattern": "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$" | ||
} | ||
}, | ||
"definitions": { | ||
"dtype": { | ||
"anyOf": [ | ||
{"$ref": "#/definitions/flat_dtype"}, | ||
{"$ref": "#/definitions/compound_dtype"} | ||
] | ||
}, | ||
"flat_dtype": { | ||
"description": "String describing the data type of the dataset or attribute.", | ||
"anyOf": [ | ||
{ | ||
"type": "string", | ||
"enum": [ | ||
"float", | ||
"float32", | ||
"double", | ||
"float64", | ||
"long", | ||
"int64", | ||
"int", | ||
"int32", | ||
"int16", | ||
"int8", | ||
"uint", | ||
"uint32", | ||
"uint16", | ||
"uint8", | ||
"uint64", | ||
"text", | ||
"utf", | ||
"utf8", | ||
"utf-8", | ||
"ascii", | ||
"bool", | ||
"isodatetime" | ||
] | ||
}, | ||
{"$ref": "#/definitions/ref_dtype"} | ||
] | ||
}, | ||
"ref_dtype": { | ||
"type": "object", | ||
"required": ["target_type", "reftype"], | ||
"properties": { | ||
"target_type": { | ||
"description": "Describes the data_type of the target that the reference points to", | ||
"type": "string" | ||
}, | ||
"reftype": { | ||
"description": "Describes the kind of reference", | ||
"type": "string", | ||
"enum": ["ref", "reference", "object", "region"] | ||
} | ||
} | ||
}, | ||
"compound_dtype": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"required": ["name", "doc", "dtype"], | ||
"properties": { | ||
"name": {"$ref": "#/definitions/protectedString"}, | ||
"doc": {"type": "string"}, | ||
"dtype": {"$ref": "#/definitions/flat_dtype"} | ||
} | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
from . import hdf5 | ||
from .builderupdater import SidecarValidationError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to elaborate a little bit on this to clarify the intent and scope of the sidecar file, i.e., this is for small updates and corrections only.