Documentation

franzpoeschel · Nov 24, 2023 · 3444b23 · 3444b23
1 parent 0234c46
commit 3444b23
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 9 deletions.
diff --git a/docs/source/backends/json.rst b/docs/source/backends/json.rst
@@ -38,20 +38,46 @@ when working with the JSON backend.
 Datasets and groups have the same namespace, meaning that there may not be a subgroup
 and a dataset with the same name contained in one group.
 
-Any **openPMD dataset** is a JSON object with three keys:
+Datasets
+........
 
- * ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
- * ``datatype``: A string describing the type of the stored data.
- * ``data`` A nested array storing the actual data in row-major manner.
+Datasets can be stored in two modes, either as actual datasets or as dataset templates.
+The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).
+
+Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:
+
+ * ``datatype`` (required): A string describing the type of the stored data.
+ * ``data`` (required): A nested array storing the actual data in row-major manner.
    The data needs to be consistent with the fields ``datatype`` and ``extent``.
    Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
+ * ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
+
+Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:
+
+* ``datatype`` (required): As above.
+* ``extent`` (required): A list of integers, describing the extent of the dataset.
+* ``attributes``: As above.
 
-**Attributes** are stored as a JSON object with a key for each attribute.
+This mode stores only the dataset metadata.
+Chunk load/store operations are ignored.
+
+Attributes
+..........
+
+In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.
+
+Attributes can be stored in two formats.
+The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` in openPMD 1.*, ``"short"`` in openPMD >= 2.0).
+
+Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
 Every such attribute is itself a JSON object with two keys:
 
  * ``datatype``: A string describing the type of the value.
  * ``value``: The actual value of type ``datatype``.
 
+Attributes in **short format** are stored as just the simple value corresponding with the attribute.
+Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).
+
 TOML File Format
 ----------------
 

diff --git a/docs/source/details/backendconfig.rst b/docs/source/details/backendconfig.rst
@@ -100,6 +100,8 @@ For JSON and ADIOS2, all datasets are resizable, independent of this option.
 Configuration Structure per Backend
 -----------------------------------
 
+Please refer to the respective backends' documentations for further information on their configuration.
+
 .. _backendconfig-adios2:
 
 ADIOS2
@@ -189,8 +191,21 @@ Explanation of the single keys:
 
 .. _backendconfig-other:
 
-Other backends
-^^^^^^^^^^^^^^
+JSON/TOML
+^^^^^^^^^
 
-Do currently not read the configuration string.
-Please refer to the respective backends' documentations for further information on their configuration.
+A full configuration of the JSON backend:
+
+.. literalinclude:: json.json
+   :language: json
+
+The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.
+
+All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
+Explanation of the single keys:
+
+* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
+  In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
+  In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read.
+* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.*).
+  The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
diff --git a/docs/source/details/json.json b/docs/source/details/json.json
@@ -0,0 +1,10 @@
+{
+  "json": {
+    "dataset": {
+      "mode": "template"
+    },
+    "attribute": {
+      "mode": "short"
+    }
+  }
+}