-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use numpy random generator instead of randomstate #6854
Comments
Hi, I was about to open an Issue about the exact same thing. This should provide a healthy speedup, which is especially important when creating large random values for volumes (e.g. RandGaussianNoidse). The numpy docs state:
In general the new Generators are much faster, as can be seen in this table in the numpy docs, showing the relative speedup over the RandomState(), which internally uses a MersenneTwister (i.e. MT19937) :
ChallengesI am not sure what the policy is on reproducibility of randomness across versions, but in that regard it would definitely prevent reproducibility of the randomness. But so does any change in the numpy version for example, their docs state that:
Implementation proposal
I would be happy to submit a PR for this if that approach seems reasonable. |
thanks for the comments, I haven't looked into the details of the backward compatibility of the random seeds, for example for this integration test, will the new generator produce the same deterministic results as RandomState, any thoughts? MONAI/tests/test_integration_determinism.py Lines 86 to 91 in 9ae72e4
|
I am not sure, but if this value 0.536134 depends on a specific output of np.RandomState then most likely it will not be the same. But multiple executions of the new random generator should produce the same results. |
ok, the proposed plan looks good, randomstate is not actively maintained any more, the generator is the way to go. would be great to have your PR to update the code logic, I can trigger tests and verify the relevant integration results if needed https://github.com/Project-MONAI/MONAI/blob/dev/tests/testing_data/integration_answers.py cc @Nic-Ma @ericspod |
Ok great, I'll get to work on that! On a similar note, but more general: Has it ever been considered to support the pytorch generator as well? Especially for large random tensors (e.g. Gaussian noise on large tensors) this might make quite a difference. For reference, I created this benchmark comparing the current RandomState, the "new" numpy generator and the torch random on cpu and gpu (note that GPU times are in microseconds):
This was created using: import torch.utils.benchmark as benchmark
import torch
import numpy as np
from functools import partial
shapes = [
(250, 250, 250),
(100,100,100)
]
np_randomstate = np.random.RandomState(0)
np_PCG64DXSM = np.random.Generator(np.random.PCG64DXSM(seed=0))
torch_gen = torch.Generator(device='cpu').manual_seed(0)
torch_gpu_gen = torch.Generator(device='cuda').manual_seed(0)
gens = {
'np_randomstate': np_randomstate.standard_normal,
'np_PCG64DXSM': np_PCG64DXSM.standard_normal,
'torch_cpu': partial(torch.randn, generator=torch_gen),
'torch_gpu': partial(torch.randn, generator=torch_gpu_gen, device='cuda')
}
results = []
for shape in shapes:
for name, gen in gens.items():
t = benchmark.Timer(
stmt='gen(shape)',
globals={'gen': gen, 'shape': shape},
label=name,
sub_label=str(shape),
description='standard_normal',
num_threads=1,
)
results.append(t.blocked_autorange(min_run_time=20))
compare = benchmark.Compare(results)
compare.print() My point being that if we wanted to support these as well (ofc not by default, especially since I'm not sure how this would behave with multiprocessing etc), it might make sense to create a Wrapper class around these that exposes a common API. This could then be used to implement the switch from RandomState to the new Generator as well. The "device" parameter of the torch implementation could then be set as a instance variable (which of course requires you to adjust that manually if required). |
And on another note: I noticed that there are a few usages of np.random.* in the code still. These raise some questions:
So it seems like there is no need to migrate those immediately, but just wanted to mention that here. Since there are around ~390 usages in tests I am not listing them here. I found usages using the following bash command: |
If this is something that should be changed then we should go ahead and do so, but from the documentation RandomState isn't maintained but neither is it deprecated. It should stick around for a good long while, the speed up would be an advantage for sure but otherwise is there any other pressing need to make this change? From a casual look at the repo One other thing in the documentation is that there's no guarantee of portable behaviour between versions of Numpy, for example this means the choice of |
Hi @john-zielke-snkeos, for this in particular I'd say these should use |
@ericspod Thanks for your reply!
Fixing the usage of those would introduce a breaking change regarding randomness/reproducibility, but I guess this is something that is acceptable? For the other usages, I think both keeping them and replacing them are fine. If we end up replacing them the question is whether we should have a MONAI only random generator somewhere then. Since I think the speedup would be negligble in the cases where you only produce a small amount of randomness, I would actually be in favor of keeping those for now, and once we established a new system in the performance-critical areas worry about whether to replace the other usages. |
In regards to my previous comment and the implementation plan, I'd still be interested to know whether you are also in favor of introducing a wrapper class that can either use numpy.Generator or RandomState to allow for backward compatibility as well as providing extensibility later on for pytorch generators (Implementation of those should be a separate PR then). Downside of the wrapper class is of course that it adds some maintenance overhead since we need to expose all functions of the underlying generators, and that there is probably a minimal overhead when having another level of indirection (although I would assume this to be minimal) |
Hi @john-zielke-snkeos, thanks for showing an interest in this! import numpy as np
seed = 12345678
r = np.random.RandomState(seed)
g0 = np.random.default_rng(seed)
g1 = np.random.Generator(np.random.PCG64(seed))
g2 = np.random.Generator(np.random.MT19937(seed))
print(r.uniform(0, 1000000))
print(g0.uniform(0, 1000000))
print(g1.uniform(0, 1000000))
print(g2.uniform(0, 1000000))
|
Second note: since we can't just remove RandomState without deprecating for multiple versions, we might need to have a wrapper as discussed. I would suggest that, as Generator is the supported numpy rng class going forward, that we might want to have Randomizable wrap RandomState in a Generator adaptor rather than the other way around. |
Hi @vikashg we had discussed this change in the past within core meetings and I think we need to return to it and consider what the approach is. I've added it to the backlog project for now but let's bring this up again at the next meeting and work towards some agreement on what the changes will be. It's going to be a large refactor so we had wanted to get 1.4 items done first and finalise the lazy transforms. |
ok sounds good. Thanks @ericspod |
Is your feature request related to a problem? Please describe.
https://numpy.org/doc/stable/reference/random/legacy.html#numpy.random.RandomState
https://numpy.org/doc/stable/reference/random/generator.html
API differences
https://numpy.org/doc/stable/reference/random/new-or-different.html#new-or-different
The text was updated successfully, but these errors were encountered: