Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
franzpoeschel committed Nov 24, 2023
1 parent 0234c46 commit 3444b23
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 9 deletions.
36 changes: 31 additions & 5 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,46 @@ when working with the JSON backend.
Datasets and groups have the same namespace, meaning that there may not be a subgroup
and a dataset with the same name contained in one group.

Any **openPMD dataset** is a JSON object with three keys:
Datasets
........

* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
* ``datatype``: A string describing the type of the stored data.
* ``data`` A nested array storing the actual data in row-major manner.
Datasets can be stored in two modes, either as actual datasets or as dataset templates.
The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).

Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:

* ``datatype`` (required): A string describing the type of the stored data.
* ``data`` (required): A nested array storing the actual data in row-major manner.
The data needs to be consistent with the fields ``datatype`` and ``extent``.
Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.

Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:

* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
* ``attributes``: As above.

**Attributes** are stored as a JSON object with a key for each attribute.
This mode stores only the dataset metadata.
Chunk load/store operations are ignored.

Attributes
..........

In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.

Attributes can be stored in two formats.
The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` in openPMD 1.*, ``"short"`` in openPMD >= 2.0).

Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
Every such attribute is itself a JSON object with two keys:

* ``datatype``: A string describing the type of the value.
* ``value``: The actual value of type ``datatype``.

Attributes in **short format** are stored as just the simple value corresponding with the attribute.
Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).

TOML File Format
----------------

Expand Down
23 changes: 19 additions & 4 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ For JSON and ADIOS2, all datasets are resizable, independent of this option.
Configuration Structure per Backend
-----------------------------------

Please refer to the respective backends' documentations for further information on their configuration.

.. _backendconfig-adios2:

ADIOS2
Expand Down Expand Up @@ -189,8 +191,21 @@ Explanation of the single keys:

.. _backendconfig-other:

Other backends
^^^^^^^^^^^^^^
JSON/TOML
^^^^^^^^^

Do currently not read the configuration string.
Please refer to the respective backends' documentations for further information on their configuration.
A full configuration of the JSON backend:

.. literalinclude:: json.json
:language: json

The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.

All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read.
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.*).
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
10 changes: 10 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"json": {
"dataset": {
"mode": "template"
},
"attribute": {
"mode": "short"
}
}
}

0 comments on commit 3444b23

Please sign in to comment.