Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr v3 support #383

Closed
wants to merge 10 commits into from
Closed

Zarr v3 support #383

wants to merge 10 commits into from

Conversation

will-moore
Copy link
Member

Work in progress (not intended to merge - just playing)...

Trying to get familiar with the current Zarr v3 development code by seeing if I could use it for the simple example we have in the docs.
That sample needed updating - currently it looks like this:

import numpy as np
import zarr
import os

from ome_zarr.io import parse_url
from ome_zarr.writer import write_image
import asyncio

async def run():
    path = "test_ngff_image.zarr"
    # os.mkdir(path)

    size_xy = 128
    size_z = 10
    rng = np.random.default_rng(0)
    data = rng.poisson(lam=10, size=(size_z, size_xy, size_xy)).astype(np.uint8)

    # write the image data
    zarr_location = await parse_url(path, mode="w")
    store = zarr_location.store
    root = zarr.Group.create(store=store)
    write_image(image=data, group=root, axes="zyx", storage_options=dict(chunks=(1, size_xy, size_xy)))

asyncio.run(run())
  • I added a FormatV05 class which creates Zarr v3 Stores (this will cause issues since the init_store() method returns different objects from the superclasses which return FSStore
  • However, the main blocker I ran into is the usage of async methods to load json etc. ZarrLocation.__init_metadata() loads json. This is called during ZarrLocation.__init__ but you can't add async designation to constructors, so we need to move that do a separate method init_meta() that is called immediately after creation.
  • Also, the parse_url() is now async, so we can't call it in a script without wrapping with e.g. asyncio.run(run()) above.
  • Current issue is that zarr.json and chunk dirs are getting created in the current directory, not within a test_ngff_image.zarr` group as intended....

cc @joshmoore @normanrz

@normanrz
Copy link

normanrz commented Jun 5, 2024

If it is not an explicit goal to move the ome-zarr-py interface to async, I would not do it. You can use the sync wrapper https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/sync.py#L56 to turn async methods into regular blocking methods.

ome_zarr/io.py Outdated
"""
Load the Zarr metadata files for the given location.
"""
self.zarray: JSONDict = self.get_json(".zarray")
self.zgroup: JSONDict = self.get_json(".zgroup")
self.zarray: JSONDict = await self.get_json(".zarray")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of loading the json files directly, can you work with the metadata objects from zarr-python?

@will-moore
Copy link
Member Author

@normanrz Thanks, I'm looking into using Group.open(path, zarr_format=None) to handle both v2 and v3,
but this fails as the Group.open() at https://github.com/zarr-developers/zarr-python/blob/661acb37f08a77cc1a86c0da55288e89f2388801/src/zarr/group.py#L501 doesn't pass on the zarr_format param to AsyncGroup.open() at
https://github.com/zarr-developers/zarr-python/blob/661acb37f08a77cc1a86c0da55288e89f2388801/src/zarr/group.py#L139

Same issue for Array.open().
Is this simply missing? If so, I can open an issue for that? Cheers.

@normanrz
Copy link

normanrz commented Jun 5, 2024

There is a bunch of ongoing work at the moment. I would recommend to check back in once the alpha release is done in the next few days. If the issue persists, I would recommend to open an issue.

@will-moore
Copy link
Member Author

@normanrz Great, will do, thx

@will-moore will-moore force-pushed the zarr_v3_support branch 2 times, most recently from bec0ade to 988dde7 Compare June 12, 2024 07:37
@joshmoore
Copy link
Member

https://pypi.org/project/zarr/3.0.0a0/ went out about 3 hours ago! 🎉

@joshmoore
Copy link
Member

@will-moore: let me know what you think about a5d8475 as a stop-gap.

@will-moore
Copy link
Member Author

Great - yes that's helpful thanks.

@jni
Copy link
Contributor

jni commented Sep 18, 2024

@will-moore @joshmoore what's the status of zarr/zarr-python v3 support in napari-ome-zarr? Or is it just going to go through dask?

@will-moore
Copy link
Member Author

Sorry about closing with no explanation...
I think that I started this effort a little too early and was not using the correct approach for a lot of the zarr async api, which is also still evolving.
When zarr-developers/zarr-python#2186 is merged it'll be time to take another look at this, starting from a clean sheet.
For the https://github.com/ome/ome2024-ngff-challenge/ project, we've been using Tensorstore to read remote zarr v2 data (and write zarr v3), but we're not planning on switching to Tensorstore for ome-zarr-py. At this point we're still expecting to continue using zarr-python and dask.

@jni
Copy link
Contributor

jni commented Sep 19, 2024

great, thanks for the update! The context for my ping is zarr-developers/zarr-python#2102, so feel free to chime in there if you think something might help! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants