Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc/v3 migration guide #2102

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Aug 19, 2024

Over the past few weeks, we've had a number of conversations/questions about the policy for backward compatibility, deprecations, and breaking changes with the upcoming 3.0 release. This doc is meant to help us iterate toward common language. In its initial form, it is not complete.

Goals for the text here:

  1. Developers of 2.18 and 3.0 should be able to decide if backward compatibility is a required attribute of a contribution
  2. Users of Zarr should be able to understand if their application will be impacted by the upcoming 3.0 release
  3. Users of Zarr should be able to make a plan for how they will adapt their usage of Zarr after the release
  4. [non-goal] This is not meant to provide a comprehensive listing of the changes to the zarr API

cc @zarr-developers/python-core-devs

@TomAugspurger
Copy link
Contributor

One thing that might be helpful: what's the group's tolerance for either compatibility code or deprecations as a way to ease the transition? It sounds like strict backwards compatibility (perhaps with warnings) isn't a goal. Is there tolerance for things like #2098 (e.g. restore some properties to the Group object, loosen the keyword-only requirement for some functions). Likewise for things like "cleaning up internal and user facing APIs", which could be done with a deprecation warning. Even if there isn't tolerance for backwards compatibility shims that really clash with the V3 spec or the current v3 implementation?

@jhamman jhamman added this to the 2.18.3 milestone Aug 27, 2024
@jhamman jhamman added the V3 Affects the v3 branch label Aug 28, 2024
@jhamman jhamman modified the milestones: 2.18.3, 3.0.0 Sep 6, 2024
Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments - I'll mark this as request changes since it still needs major additions.

docs/migration.rst Outdated Show resolved Hide resolved
docs/migration.rst Outdated Show resolved Hide resolved
docs/migration.rst Outdated Show resolved Hide resolved
docs/migration.rst Outdated Show resolved Hide resolved
docs/migration.rst Show resolved Hide resolved
1. Pin the supported Zarr-Python version to ``zarr>=2,<3``. This is a best practice and will protect your users from any incompatibilities that may arise during the release of Zarr-Python 3.0.
2. Limit your imports from the Zarr-Python package. Most of the primary API ``zarr.*`` will be compatible in 3.0. However, the following breaking API changes are planned:

- ``numcodecs.*`` will no longer be available in ``zarr.*``. (Suggested action: transition to importing codecs from ``numcodecs`` directly.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``numcodecs.*`` will no longer be available in ``zarr.*``. (Suggested action: transition to importing codecs from ``numcodecs`` directly.)
- ``zarr.numcodecs.*`` will no longer be available. These imports can be replaced by importing ``numcodecs`` directly.

We should be doing more than "suggesting", we should be providing concrete fixes/code updates!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish it were that easy! Unfortunately, the top level namespace is littered with * imports

from zarr.codecs import *

from numcodecs import *

If there is a reasonable way for us to deprecate folks using these imports, I'm all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there isn't a reasonable way to deprecate the imports, as a minimum this migration guide should provide a complete list of all imports that are disappearing and how to replace them.

docs/migration.rst Outdated Show resolved Hide resolved
Comment on lines 34 to 48
- The following internal modules are being removed or significant changed:

- ``zarr.attrs``
- ``zarr.codecs``
- ``zarr.context``
- ``zarr.core``
- ``zarr.hierarchy``
- ``zarr.indexing``
- ``zarr.meta``
- ``zarr.meta_v1``
- ``zarr.storage``
- ``zarr.sync``
- ``zarr.types``
- ``zarr.util``
- ``zarr.n5``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a clear list of how to update code to adapt to these removals or changes.

docs/migration.rst Show resolved Hide resolved
docs/migration.rst Outdated Show resolved Hide resolved
@jni
Copy link
Contributor

jni commented Sep 16, 2024

I just came here following the link from #1849. Currently, this guide feels very incomplete, and since #2182 is now in-flight, I think this guide should become very clear.

From the perspective of a maintainer / close user of a bunch of libraries that depend on zarr but have extremely limited maintainer time, the most important thing I want to understand is how hard it will be to support both zarr 2.18 and zarr 3+ within a single library. I think this is the critical question for a smooth transition, because it is hard for libraries to all migrate at the same time, and you want libraries to be installable together in the same environment — you don't want someone depending on both napari and ome-zarr to face napari requiring zarr>=3 and ome-zarr requiring zarr<3. So many libraries would want to support the subset of zarr that is identical in v3 and v2 until everyone can agree to depend on 3+.

@jhamman
Copy link
Member Author

jhamman commented Sep 17, 2024

@jni - thanks for the feedback. I agree this is not ready to ship yet. The main things that we know are changing is the Store API and access to internal APIs (e.g. zarr.core.xxx). Beyond that, the best way for us to fill out the migration guide is to have projects attempt to support v3 and report back. I'm doing that with Dask right now (dask/dask#11388, #2186) and I understand @TomAugspurger has begun the process for Xarray. So an ask for you and the Napari / ome-zarr devs is to try to do this and report back. Beyond that, specific suggestions to this doc are more than welcome (props to @dstansby for his edits already).

@dstansby
Copy link
Contributor

If we want downstream packages to test with version 3.0.0a1, it might be good to do a blog post or add something to the docs explaining how to do that testing, what to look for, and how to provide feedback?

@jni
Copy link
Contributor

jni commented Sep 18, 2024

Well, @Czaki started doing that for us in napari/napari#7215 and @d-v-b has been helping 🙏.

It looks like one of the remaining issues is that zarr.open defaults to v3 zarr (it'd be worth considering switching to calver, which I hate, if only to avoid the confusion between zarr format v3 and zarr-python v3... 😂), and tensorstore does not yet support v3 zarr files. At least that's my interpretation of these lines. Nor do I see any motions to change this in the tensorstore repo... @jbms?

But I think we can resolve this by explicitly writing a v2 zarr in the test?

specific suggestions to this doc are more than welcome

Something along the lines of:

Common functions have switched to keyword-only arguments, so you will need to change any invocation of, for example, zarr.open(path, 'a') to zarr.open(store=path, mode='a').

(An exhaustive list of such changes would be useful.)

@d-v-b
Copy link
Contributor

d-v-b commented Sep 18, 2024

and tensorstore does not yet support v3 zarr files.

Tensorstore has supported zarr v3 for a long time: https://google.github.io/tensorstore/driver/zarr3/index.html

@jni
Copy link
Contributor

jni commented Sep 18, 2024

Tensorstore has supported zarr v3 for a long time

oh, interesting, thanks for pointing that out @d-v-b! I couldn't actually find the relevant PR but did not look exhaustively. 🙏

@TomAugspurger
Copy link
Contributor

A few changes I've found while updating xarray. Are all of these intentional?

  • Array.resize returns a new Array object. 2.x mutated the Array in place
  • zarr_version has been renamed to zarr_format
  • Some exception types have changed (e.g. 2.x raised a zarr.errors.GroupNotFoundError while 3.x raises a ValueError)
  • write_empty_chunks has been removed

@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:52
@jni jni mentioned this pull request Oct 17, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Affects the v3 branch
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

5 participants