From 38ab95d25d21ae21dd4dc752dbbf413178afea50 Mon Sep 17 00:00:00 2001 From: jmoore Date: Thu, 21 Apr 2022 11:18:25 +0200 Subject: [PATCH] Propose resolution _groups_ for xarray support (see #48) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In discussing with the xarray community, the one change to the NGFF specification that needs to occur to prevent errors being raised when opening a multiscale is for each resolution _array_ to live in a separate _group_. This has already been tested by thewtex in https://github.com/spatial-image/spatial-image-multiscale and the current spec is permissive enough to allow it. The proposal here would enforce the subdirectories moving forward. The conflict in xarray stems from the fact that each of our subresolutions have the same dimension names ("x", "y,", etc.) but different sizes. This is not allowed in the xarray (nor NetCDF) model. An added benefit of this change is that other arrays with the same resolution levels and the same dimensions (e.g. labels!) could be stored together: ``` ├── resolution-N/.zgroup │ ├── image/.zarray │ └── labe/.zarray ``` --- latest/index.bs | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/latest/index.bs b/latest/index.bs index 76a40ab2..6cac6829 100644 --- a/latest/index.bs +++ b/latest/index.bs @@ -141,19 +141,23 @@ For this example we assume an image with 5 dimensions and axes called `t,c,z,y,x │ # "multiscales" and "omero" (see below). In addition, the group level attributes │ # must also contain "_ARRAY_DIMENSIONS" if this group directly contains multi-scale arrays. │ - ├── 0 # Each multiscale level is stored as a separate Zarr array, - │ ... # which is a folder containing chunk files which compose the array. - ├── n # The name of the array is arbitrary with the ordering defined by + ├── 0 # Each multiscale level is stored as a separate Zarr group + │ # which contains arrays of data at that particular resolution level. + │ ... + ├── n # The name of the group is arbitrary with the ordering defined by │ │ # by the "multiscales" metadata, but is often a sequence starting at 0. │ │ - │ ├── .zarray # All image arrays must be up to 5-dimensional - │ │ # with the axis of type time before type channel, before spatial axes. - │ │ - │ └─ t # Chunks are stored with the nested directory layout. - │ └─ c # All but the last chunk element are stored as directories. - │ └─ z # The terminal chunk is a file. Together the directory and file names - │ └─ y # provide the "chunk coordinate" (t, c, z, y, x), where the maximum coordinate - │ └─ x # will be `dimension_size / chunk_size`. + │ └── image # Within the group, there will typically be a single array named "image". + │ │ # Other arrays may be added in future versions. + │ │ + │ ├── .zarray # All image arrays must be up to 5-dimensional + │ │ # with the axis of type time before type channel, before spatial axes. + │ │ + │ └─ t # Chunks are stored with the nested directory layout. + │ └─ c # All but the last chunk element are stored as directories. + │ └─ z # The terminal chunk is a file. Together the directory and file names + │ └─ y # provide the "chunk coordinate" (t, c, z, y, x), where the maximum coordinate + │ └─ x # will be `dimension_size / chunk_size`. │ └── labels │