Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
franzpoeschel committed Nov 20, 2023
1 parent 3b05181 commit 9d1bb29
Showing 1 changed file with 42 additions and 0 deletions.
42 changes: 42 additions & 0 deletions docs/source/usage/workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,48 @@
Workflow
========

Storing and reading chunks
--------------------------

1. **Chunks within an n-dimensional dataset**

Most commonly, chunks within an n-dimensional dataset are identified by their offset and extent.
The extent is the size of the chunk in each dimension, NOT the absolute coordinate within the entire dataset.

In the Python API, this is modeled to conform to the conventional ``__setitem__``/``__getitem__`` protocol.

2. **Joined arrays (write only)**

(Currently) only supported in ADIOS2 no older than v2.9.0 under the conditions listed in the `ADIOS2 documentation on joined arrays <https://adios2.readthedocs.io/en/latest/components/components.html#shapes>`_.

In some cases, the concrete chunk within a dataset does not matter and the computation of indexes is a needless computational and mental overhead.
This commonly occurs for particle data which the openPMD-standard models as a list of particles.
The order of particles does not matter greatly, and making different parallel processes agree on indexing is error-prone boilerplate.

In such a case, at most one *joined dimension* can be specified in the Dataset, e.g. ``{Dataset::JOINED_DIMENSION, 128, 128}`` (3D for the sake of explanation, particle data would normally be 1D).
The chunk is then stored by specifying an empty offset vector ``{}``.
The chunk extent vector must be equivalent to the global extent in all non-joined dimensions (i.e. joined arrays allow no further sub-chunking other than concatenation along the joined dimension).
The joined dimension of the extent vector specifies the extent that this piece should have along the joined dimension.
The global extent of the dataset along the joined dimension will then be the sum of all local chunk extents along the joined dimension.

Since openPMD follows a struct-of-array layout of data, it is important not to lose correlation of data between components. E.g., joining an array must take care that ``particles/e/position/x`` and ``particles/e/position/y`` are joined in uniform way.

The openPMD-api makes the **following guarantee**:

Consider a Series written from ``N`` parallel processes between two (collective) flush points. For each parallel process ``n`` and dataset ``D``, let:

* ``chunk(D, n, i)`` be the ``i``'th chunk written to dataset ``D`` on process ``n``
* ``num_chunks(D, n)`` be the count of chunks written by ``n`` to ``D``
* ``joined_index(D, c)`` be the index of chunk ``c`` in the joining order of ``D``

Then for any two datasets ``x`` and ``y``:

* If for any parallel process ``n`` the condition holds that ``num_chunks(x, n) = num_chunks(y, n)`` (between the two flush points!)...
* ...then for any parallel process ``n`` and chunk index ``i`` less than ``num_chunks(x, n)``: ``joined_index(x, chunk(x, n, i)) = joined_index(y, chunk(y, n, i))``.

**TLDR:** Writing chunks to two joined arrays in synchronous way (**1.** same order of store operations and **2.** between the same flush operations) will result in the same joining order in both arrays.


Access modes
------------

Expand Down

0 comments on commit 9d1bb29

Please sign in to comment.