diff --git a/docs/modules/statistics.rst b/docs/modules/statistics.rst index 669b0713..a09e55ff 100644 --- a/docs/modules/statistics.rst +++ b/docs/modules/statistics.rst @@ -12,7 +12,7 @@ Various statistic and metric calculations for analysing inference data. .. autosummary:: :nosignatures: :toctree: generated/statistics/ - :template: class.rst + :template: statistics.rst statistics.mean statistics.variance diff --git a/docs/modules/utils.rst b/docs/modules/utils.rst index 9cae5802..6bb66a44 100644 --- a/docs/modules/utils.rst +++ b/docs/modules/utils.rst @@ -23,6 +23,7 @@ A collection of utilities to manipulate and check coordinate systems dictionarie utils.coords.handshake_coords utils.coords.handshake_size utils.coords.map_coords + utils.coords.extract_coords :mod:`earth2studio.utils`: Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/templates/io.rst b/docs/templates/io.rst index 9d9b8227..dbb59216 100644 --- a/docs/templates/io.rst +++ b/docs/templates/io.rst @@ -6,7 +6,7 @@ .. autoclass:: {{ objname }} {% block methods %} - .. automethod:: initialize + .. automethod:: add_array .. automethod:: write {% endblock %} diff --git a/docs/templates/statistics.rst b/docs/templates/statistics.rst new file mode 100644 index 00000000..1467337f --- /dev/null +++ b/docs/templates/statistics.rst @@ -0,0 +1,20 @@ +:mod:`{{module}}`.{{objname}} +{{ underline }}============== + +.. currentmodule:: {{module}} + +.. autoclass:: {{ objname }} + + {% block methods %} + .. automethod:: __call__ + {% endblock %} + +.. _sphx_glr_backref_{{module}}.{{objname}}: + +.. minigallery:: {{module}}.{{objname}} + :add-heading: + :heading-level: ^ + +.. raw:: html + +
\ No newline at end of file diff --git a/docs/userguide/features/io.md b/docs/userguide/features/io.md new file mode 100644 index 00000000..ab8f41e3 --- /dev/null +++ b/docs/userguide/features/io.md @@ -0,0 +1,126 @@ +(output_handling_userguide)= + +# Output Handling + +While input data handling is primarily managed by the data sources in +{mod}`earth2studio.data`, output handling is managed by the IO backends available +in {mod}`earth2studio.io`. These backends are designed to balance the ability for +users to customize the arrays and metadata within the exposed backend while also +making it easy to design resuable workflows. + +The key extension of the typical `(x, coords)` data structure movement throughout +the rest of the `earth2studio` code and output store compatibility is the notion of +an `array_name`. Names distinguish between different arrays within the backend and +is currently a requirement for storing `Datasets` in `xarray`, `zarr`, and `netcdf`. +This means that the user must supply a name when adding an array to a store or when +writing an array. A frequent pattern is to extract one dimension of an array, +such as `"variable"` to act as individual arrays in the backend, see the examples below. + +## IO Backend Interface + +The full requirements for a standard prognostic model our defined explicitly in the +`earth2studio/io/base.py`. + +```{literalinclude} ../../../earth2studio/io/base.py + :lines: 24- + :language: python +``` + +:::{note} +IO Backends do not need to inherit this protocol, this is simply used to define +the required APIs. Some built-in IO backends also may offer additional functionality +that is not universally supported (and hence not required). +::: + +There are two important methods that must be supported: `add_array`, which +adds an array to the underlying store and any attached coordinates, and `write`, +which explicity stores the passed data in the backend. The `write` command may +induce synchronization if the input tensor resides on the GPU and the store. Most +stores make a conversion from PyTorch to numpy in this process. The +{mod}`earth2studio.io.kv` backend has the option for storing data on the GPU, which can be +done asynchronously. + +Most data stores offer a number of additional utilities such as `__contains__`, +`__getitem__`, `__len__`, and `__iter__`. For examples, see the implementation in +{mod}`earth2studio.io.ZarrBackend`: + +```{literalinclude} ../../../earth2studio/io/zarr.py + :lines: 53-81 + :language: python +``` + +Because of `datetime` compatibility, we recommend using the `ZarrBackend` as a default. + +## Initializing a Store + +A common data pattern seen throughout our example workflows is to initialize the +variables and dimensions of a backend using a complete `CoordSystem`. For example: + +```python +# Build a complete CoordSystem +total_coords = OrderedDict( + dict( + 'ensemble': ..., + 'time': ..., + 'lead_time': ..., + 'variable': ..., + 'lat': ..., + 'lon': ... + ) +) + +# Give an informative array name +array_name = 'fields' + +# Initialize all dimensions in total_coords and the array 'fields' +io.add_array(total_coords, 'fields') +``` + +It can be tedious to define each coordinate and dimension, luckily if we have +a prognostic or diagnostic model, most of this information is already available. +Here is a robust example of such a use-case: + +```python +# Set up IO backend +# assume we have `prognostic model`, `time` and `array_name` +# Copy prognostic model output coordinates +total_coords = OrderedDict( + { + k: v for k, v in prognostic.output_coords.items() if + (k != "batch") and (v.shape != 0) + } +) +total_coords["time"] = time +total_coords["lead_time"] = np.asarray( + [prognostic.output_coords["lead_time"] * i for i in range(nsteps + 1)] +).flatten() +total_coords.move_to_end("lead_time", last=False) +total_coords.move_to_end("time", last=False) +io.add_array(total_coords, array_name) +``` + +A common use-case is to extract a particular dimension (usually `variable`) as +the array names. + +```python +# A modification of the previous example: +var_names = total_coords.pop("variable") +io.add_array(total_coords, var_names) +``` + +## Writing to the store + +Once the data arrays have been initialized in the backend, writing to those arrays +is a single line of code. + +```python +x, coords = model(x, coords) +io.write(x, coords, array_name) +``` + +If, as above, the user is extracting a dimension of the tensor to use as array names +then they can make use of {mod}`earth2studio.utils.coords.extract_coords`: + +```python +io.write(*extract_coords(x, coords, dim = "variable")) +``` diff --git a/docs/userguide/features/statistics.md b/docs/userguide/features/statistics.md new file mode 100644 index 00000000..6dedda00 --- /dev/null +++ b/docs/userguide/features/statistics.md @@ -0,0 +1,52 @@ +(statistics_model_userguide)= + +# Statistics + +Statistics are distinct from prognostic and diagnostic models in principle because +we assume that statistics reduce existing coordinates so that the output tensors +have a coordinate system that is a subset of the input coordinate system. This +makes statistics less flexible than diagnostic models but have fewer API requirements. + +## Statistics Interface + +Statistics API only specify a {func}`__call__` method that matches similar methods +across the package. + +```{literalinclude} ../../../earth2studio/statistics/base.py + :lines: 24-43 + :language: python +``` + +The base API hints at, and inspection of the {mod}`earth2studio.statistics.moments` +examples, the use of a few properties to make statistic handling easier: +`reduction_dimensions`, which are a list of dimensions that will be reduced over, +`weights`, which must be broadcastable with `reduction_dimensions`, and `batch_update`, +which is useful for applying statistics when data comes in streams/batches. + +Where applicable, specified `reduction_dimensions` sets a requirement for the +coordinates passed in the call method. + +## Custom Statistics + +Integrating your own statistics is easy, just satisfy the interface above. We recommend +users look at the custom statistic example in the {ref}`extension_examples` examples. + +# Metrics + +Like statistics, metrics are reductions across existing dimensions. Unlike statistics, +which are usually defined over a single input, we define metrics to take a pair of +inputs. Otherwise, the API and requirements are similar to the statistics requirements. + +## Metrics Interface + +```{literalinclude} ../../../earth2studio/statistics/base.py + :lines: 45- + :language: python +``` + +## Contributing Statistics and Metrics + +Want to add your own statistic or metric to the package? Great, we will be happy to +work with you. At the minimum we expect the model to abide by the interface defined +above. We may also work with the user to ensure that there are `reduction_dimensions` +applicable and, if possible, weight and batching support possible. diff --git a/docs/userguide/index.md b/docs/userguide/index.md index 19b09f11..f0512b0c 100644 --- a/docs/userguide/index.md +++ b/docs/userguide/index.md @@ -33,6 +33,8 @@ intro/data features/prognostic features/diagnostic features/datasources +features/io +features/statistics ``` ## Support