Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is it necessary to produce invalid OME-Zarr images? #803

Open
tischi opened this issue Dec 6, 2024 · 4 comments
Open

Why is it necessary to produce invalid OME-Zarr images? #803

tischi opened this issue Dec 6, 2024 · 4 comments

Comments

@tischi
Copy link

tischi commented Dec 6, 2024

Hi,

As far as I understand SpatialData is currently producing invalid OME-Zarr images.

I would like to understand why you choose to do this, even though it has obvious disadvantages such as:

  • Other tools cannot open these images
  • The image data cannot be published

Naively, I would have just added a spatialdata_multiscales into the JSON next to the official multiscales and then put whatever you need that is not compatible with the current OME-Zarr spec into spatialdata_multiscales. Why was this impossible?

Ping @LucaMarconato @joshmoore.

@LucaMarconato
Copy link
Member

Hi, thanks for starting the discussion. The reason is that by design we never tried to be compatible with NGFF 0.4 because when we started developing our package it seemed that the NGFF 0.5 release would have been imminent, and therefore we would have been able to be fully aligned with NGFF before publication. This approach is, for instance, shown in this issue: #125, where we state that we plan to be fully compliant with NGFF once the specs are approved.

Unfortunately, there have been delays with the NGFF release, and this, paired with the fact that users gave feedback mostly on the Python APIs part of the library (and not on the file format), motivated us to temporarily put more emphasis on the Python part and postpone the work on the file format.

Anyway, we are aware that aligning with NGFF is crucial for cross-language interoperability, and this, paired with the recent progress with NGFF that has been enabled by the new RFC system, motivates us to prioritize the work on the file format.

In concrete terms, two changes in NGFF are going to be enacted (hopefully very) soon:

  • The labels will be detached from the image, meaning that we can reuse it across elements.
  • The new coordinate transformations will be available.

Both changes are required from us, and without them, there is no hope to be NGFF compliant. This makes interoperability more difficult, but we created this resource (in the context of interoperability with R) to mitigate these challenges https://github.com/scverse/spatialdata-notebooks/tree/4e430ab0c0c316b101c40c5918d3c903a71d51db/notebooks/developers_resources/storage_format.

Until then, we are working on aligning with the latest version of the transformation specification (the one that will get merged). To do this, we are going to contribute to https://github.com/BioImageTools/ome-zarr-models-py, by moving our implementation of the new transformations into an official repository, and in the process ensuring that our format fully aligns with the specification. I will push to try to have this done by March next year.

@LucaMarconato
Copy link
Member

In the context of interoperability with Java, I think an effective strategy would be to implement the transformation specification from John. When this is completed, and if by then we also manage to be fully compliant with the specification, then the support for SpatialData datasets should come automatically. Finally, I will give updates in this thread when progress is available from our side, so that we can quickly converge.

CC @giovp @kevinyamauchi in case they want to add something extra.

@tischi
Copy link
Author

tischi commented Dec 9, 2024

Dear @LucaMarconato,

Thank you for the explanations and the plan for the long term future.

May I also come back my initial question in this thread as to whether it would be been possible in the past to have an extra spatialdata_multiscales entry that contains the incompatible information?

Looking into the immediate future, as to how to open the current zarrs produces by spatialdata I will open a new issue.

@LucaMarconato
Copy link
Member

LucaMarconato commented Dec 10, 2024

I think making a spatialdata_multiscales could have not been an option in the past either but it would have defeated the alignment system we heavily rely on. I think the right thing to have done, as @joshmoore mentioned, would have been to make it clear that the format is not 0.4 but like 0.5-dev. Never to late to do that, and now it seems a good time for this. I will follow up in the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants