FOV names and array names are hard-coded for the HCS dataset reader #248

ziw-liu · 2023-02-10T19:13:01Z

# The name of the array is arbitrary with the ordering defined by
# by the "multiscales" metadata, but is often a sequence starting at 0.

And ome-zarr-py works with arbitrary array names in the single-FOV images layout, but not in an HCS plate layout. Only integer names starting with "0" is recognized.

It should instead be discovered from metadata (multiscales.datasets[0].path).

Also, FOV names not starting with "0" is not supported. The cause is likely this line hard-coding 'first field':

ome-zarr-py/ome_zarr/reader.py

Line 493 in c1302e0

self.first_field = "0"

It should instead be discovered from metadata (well.images[0].path). Spec:

The well dictionary MUST contain an images key whose value MUST be a list of JSON objects specifying all fields of views for a given well. Each image object MUST contain a path key whose value MUST be a string specifying the path to the field of view. The path MUST contain only alphanumeric characters, MUST be case-sensitive, and MUST NOT be a duplicate of any other path in the images list.

Edit: separate description for array and position names.

The text was updated successfully, but these errors were encountered:

ziw-liu · 2023-02-13T18:28:04Z

According to #200 (comment), one can view all the FOVs by loading individual wells. However this only works for incremental integer FOV path names, not any 'case sensitive, alphanumeric' path names defined in the specification. Possible cause:

ome-zarr-py/ome_zarr/reader.py

Lines 427 to 431 in c1302e0

    
           def get_field(tile_name: str, level: int) -> np.ndarray: 
        
               """tile_name is 'row,col'""" 
        
               row, col = (int(n) for n in tile_name.split(",")) 
        
               field_index = (column_count * row) + col 
        
               path = f"{field_index}/{level}"

will-moore · 2023-02-17T07:51:50Z

Thanks for the feedback.
You're right that there is an assumption made about the path to first image in a Well.
It would be a pretty easy fix to load the Well images for a Well and read that data. It's a bit of a bigger change to read that data for ALL Wells in a plate if you can't assume that all Wells have the same path to first image.

This is a similar type of change to #241 where it becomes slower to load plate data if you don't make assumptions about all the Wells being the same.

Reading Well images may not have such a big impact as the changes on that PR, but it raises similar questions:
For plates where all Wells have the same path to first image, we would prefer NOT to have to load the Images for every Well to check.

@ziw-liu In your data, does each Well have a different path to the first image?
If so, then I guess we first need to test and see how much slower it is to read that for each image.

ziw-liu · 2023-02-17T17:06:57Z

@will-moore Thanks for the explanation.

I agree that hardcoding is the fastest way to get the images. As for the dataset I have in mind, all the wells would have the same path (in fact '0') to the first image. I was trying to load synthetic data (generated by testing a custom writer) with the viewer, and found that writing with different FOV names that should be valid according to the specification is only supported by the ome-zarr-py reader when loading a single well.

The array name finding is of a bit more real-world concern. In our workflow it is common to transform the raw data into other arrays with incompatible data types and shapes, and it would be a cleaner data management practice if these can all be held in the same Zarr group with their names indicating their nature (e.g. 'raw', 'projection'). It is possible to rename the arrays we want to load with ome-zarr-py and napari-ome-zarr to '0' so that they are recognized, but it would be nice if the NGFF specification is fully implemented here, and data can be accessed with Zarr-style name keys when not using ome-zarr-py.

As for the performance concern, I don't have an intuition for cloud object storage, but for a storage server in the local netowrk, increased I/O operations should only be linear to the image count to load the .zattrs files. But of course only testing can give a real answer.

will-moore · 2023-02-24T15:53:07Z

Thanks for the info. This makes good sense.
Do you have a 'multiscales' object for e.g. raw and projection, each with a pyramid of 'datasets'?
Currently, only the first multiscales is object is read by ome-zarr-py.

ziw-liu · 2023-02-24T17:56:05Z

Do you have a 'multiscales' object for e.g. raw and projection, each with a pyramid of 'datasets'?

I do not think we will have multiple pyramids for a single FOV, and I agree that for visualization only loading the first pyramid makes sense. Are you suggesting that incompatible arrays can be stored in separate multiscales objects?

will-moore · 2023-02-27T13:41:03Z

The spec allows multiple multiscales images https://ngff.openmicroscopy.org/latest/#multiscale-md and doesn't say anything about the relationship between them.
However, currently all NGFF tools assume only 1 multiscales image per NGFF file and ignore any others: (see 'multiple multiscales' at https://ome.github.io/ome-ngff-tools/)

aeisenbarth · 2023-08-23T21:18:17Z

I fell into the same problem. Well image path and field index seem to be used a bit interchangeably in the code, but this is not reflected in any document, especially the spec. For my use case, I'd prefer to have human-readable image names, because these are displayed, e.g. in Napari.

If the reason are performance concerns about accessing image paths when reading the plate, then maybe the paths should not (only) live at the well level, or the spec could be modified not to allow non-index image names.

Since ome-zarr-py already has assumptions (not fully implemented spec), it's a matter of weighing up the probability that a dataset has well image path != field index versus the probability that the first well's image paths are not the same for all wells. I think the second is less likely and a fix would increase the amount of supported datasets.

ziw-liu changed the title ~~Array name is hard-coded for the HCS dataset reader~~ FOV names and array names are hard-coded for the HCS dataset reader Feb 10, 2023

ziw-liu mentioned this issue Feb 10, 2023

Refactor API for NGFF datasets czbiohub-sf/iohub#31

Merged

aeisenbarth mentioned this issue Aug 23, 2023

Avoid hard-coded field-of-view path "0" #300

Merged

will-moore closed this as completed in #300 Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FOV names and array names are hard-coded for the HCS dataset reader #248

FOV names and array names are hard-coded for the HCS dataset reader #248

ziw-liu commented Feb 10, 2023 •

edited

Loading

ziw-liu commented Feb 13, 2023

will-moore commented Feb 17, 2023

ziw-liu commented Feb 17, 2023 •

edited

Loading

will-moore commented Feb 24, 2023

ziw-liu commented Feb 24, 2023

will-moore commented Feb 27, 2023

aeisenbarth commented Aug 23, 2023

FOV names and array names are hard-coded for the HCS dataset reader #248

FOV names and array names are hard-coded for the HCS dataset reader #248

Comments

ziw-liu commented Feb 10, 2023 • edited Loading

ziw-liu commented Feb 13, 2023

will-moore commented Feb 17, 2023

ziw-liu commented Feb 17, 2023 • edited Loading

will-moore commented Feb 24, 2023

ziw-liu commented Feb 24, 2023

will-moore commented Feb 27, 2023

aeisenbarth commented Aug 23, 2023

ziw-liu commented Feb 10, 2023 •

edited

Loading

ziw-liu commented Feb 17, 2023 •

edited

Loading