Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Move to numpy.Generators #6899

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

john-zielke-snkeos
Copy link
Contributor

@john-zielke-snkeos john-zielke-snkeos commented Aug 23, 2023

Fixes #6854 .

Description

In order to move from the deprecated numpy.random.RandomState to the faster and supported numpy.random.Generator, in a first step the generation of randomness in transform and related functions is migrated (i.e. everything with self.R/ implementing Randomizable). The idea of this PR is to change the API for randomness to that of Generators. In order to preserve reproducibility, the old Random Generator should still be used by default while providing an option to switch over to the new implementation.

Breaking changes

  • The API for generating randomness inside transforms uses the new numpy.random.Generator API, custom transforms would therefore have to adapt the new functionality as well

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@john-zielke-snkeos
Copy link
Contributor Author

john-zielke-snkeos commented Aug 23, 2023

@wyli This draft PR is still very much WIP, but I wanted to place this here early to discuss how to best implement this migration.
During the implementation, I came across a few questions (apart from the comments in the code with FIXME):

  1. Some methods accept a random_state as a parameter. I suggested to migrate this parameter to "generator" given the new API, using a deprecation on the old parameter. If we do this, how should we handle backwards compatibility and positional arguments? IMHO In an ideal world these later parameters would all be keyword-only and therefore not have this issue.
  2. With the new API, all custom transforms implemented by users would break if they use now non-existent random functions (because of the differing API of RandomState and Generator). Is this acceptable or do we need to provide a wrapper around Generator in order to still expose these functions (and if so, should this cover all RandomState functions or only the most common?)
  3. Regarding the Protocol and the wrapper for the old RandomState:
    3.1. Any recommendations for the type annotations on these functions?
    3.2. Should the return type allow to be Tensors as well (This would allow users to also plug in a wrapper around the pytorch random functions)
    3.3. Should we expose all functions that the Generator has?
  4. Should we also deprectate using the state parameter of set_random_state and use a new parameter called generator? While better from the parameter naming, it makes it less consistent with the method name (and I don't think we want to change this as well just to have the naming align).

@wyli
Copy link
Contributor

wyli commented Aug 24, 2023

thanks for the PR, I remember that the integer random seeds are flexible and could be used for coordinating the randomised behaviour across worker processes (it seems that it's not possible to pass a random state object instead), we may want to keep the integer type random seeds settings. please see usage in dataloader:

MONAI/monai/data/utils.py

Lines 702 to 705 in c564ded

def set_rnd(obj, seed: int) -> int:
"""
Set seed or random state for all randomizable properties of obj.
, also #5067

@john-zielke-snkeos
Copy link
Contributor Author

john-zielke-snkeos commented Aug 24, 2023

thanks for the PR, I remember that the integer random seeds are flexible and could be used for coordinating the randomised behaviour across worker processes (it seems that it's not possible to pass a random state object instead), we may want to keep the integer type random seeds settings. please see usage in dataloader:

@wyli Sorry I am not sure what this is referencing. What is changing that is breaking this existing behavior?

@wyli
Copy link
Contributor

wyli commented Aug 24, 2023

right, I thought the proposal was about replacing both seed and state with a generator, I had another look, it's about replacing state with generator, in that case I don't have any concern, thanks!

Copy link
Contributor

@KumoLiu KumoLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, overall looks good to me. Several minor comments inline, feel free to ignore them.

R: np.random.RandomState = np.random.RandomState()

def set_random_state(self, seed: int | None = None, state: np.random.RandomState | None = None) -> Randomizable:
R: SupportsRandomGeneration = LegacyRandomStateAdaptor() # FIXME Why is this initialized here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we also need to update the docstring in L181 to tell users we may change to Generator near future.

def handle_legacy_random_state(
rand_state: np.random.RandomState | None = None,
generator: SupportsRandomGeneration | None = None,
return_legacy_default_random: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the user directly change to the Generator by setting the flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is more of a internal utility function (should probably be prefixed with _) during the migration to be used in the utility functions that accept a random_state (e.g. get_random_patch). The return_legacy_default_random is to return the default value as it is currently done (If there ends up being no use case where the default is not required, the parameter could be removed as well).

To change the default generator for transforms the place would be to change the _default_random_generator_factory variable to numpy.random.default_rng

@john-zielke-snkeos
Copy link
Contributor Author

john-zielke-snkeos commented Aug 24, 2023

Regarding my previous comment:

  1. With the new API, all custom transforms implemented by users would break if they use now non-existent random functions (because of the differing API of RandomState and Generator). Is this acceptable or do we need to provide a wrapper around Generator in order to still expose these functions (and if so, should this cover all RandomState functions or only the most common?)

How about we add a __getattr__ method to the LegacyRandomStateAdaptor. When you try to call one of the old methods, they will get redirected to the underlying RandomState but we print a deprecation warning? This of course would mean that you cannot migrate to the new Generator until all your third-party transforms have also migrated, but that should be acceptable

@@ -171,6 +172,9 @@ def _log_stats(data, prefix: str | None = "Data"):
raise RuntimeError(f"applying transform {transform}") from e


_default_random_generator_factory: Callable[[Any | None], SupportsRandomGeneration] = LegacyRandomStateAdaptor
Copy link
Contributor Author

@john-zielke-snkeos john-zielke-snkeos Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a function to allow users to change this out? For example _set_default_random_generator_factory(factory: Callable|str) where you can supply your own factory or use "legacy" or "generator" to switch to the new or old behavior.

@johnzielke johnzielke mentioned this pull request Jul 22, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use numpy random generator instead of randomstate
3 participants