use numpy random generator instead of randomstate #6854

wyli · 2023-08-10T14:03:53Z

Is your feature request related to a problem? Please describe.
https://numpy.org/doc/stable/reference/random/legacy.html#numpy.random.RandomState

The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements.

https://numpy.org/doc/stable/reference/random/generator.html

The Generator provides access to a wide range of distributions, and served as a replacement for RandomState. The main difference between the two is that Generator relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions. The default BitGenerator used by Generator is PCG64. The BitGenerator can be changed by passing an instantized BitGenerator to Generator.

API differences
https://numpy.org/doc/stable/reference/random/new-or-different.html#new-or-different

john-zielke-snkeos · 2023-08-17T07:34:27Z

Hi, I was about to open an Issue about the exact same thing. This should provide a healthy speedup, which is especially important when creating large random values for volumes (e.g. RandGaussianNoidse). The numpy docs state:

The normal, exponential and gamma generators use 256-step Ziggurat methods which are 2-10 times faster than NumPy’s default implementation in standard_normal, standard_exponential or standard_gamma.

import numpy.random

rng = np.random.default_rng()

%timeit -n 1 rng.standard_normal(100000)
%timeit -n 1 numpy.random.standard_normal(100000)
1.14 ms +- 24.1 us per loop (mean +- std. dev. of 7 runs, 1 loop each)
2.17 ms +- 37.3 us per loop (mean +- std. dev. of 7 runs, 1 loop each)

%timeit -n 1 rng.standard_exponential(100000)
%timeit -n 1 numpy.random.standard_exponential(100000)
894 us +- 23 us per loop (mean +- std. dev. of 7 runs, 1 loop each)
1.55 ms +- 31.9 us per loop (mean +- std. dev. of 7 runs, 1 loop each)

%timeit -n 1 rng.standard_gamma(3.0, 100000)
%timeit -n 1 numpy.random.standard_gamma(3.0, 100000)
2.39 ms +- 94 us per loop (mean +- std. dev. of 7 runs, 1 loop each)
4.55 ms +- 185 us per loop (mean +- std. dev. of 7 runs, 1 loop each)

In general the new Generators are much faster, as can be seen in this table in the numpy docs, showing the relative speedup over the RandomState(), which internally uses a MersenneTwister (i.e. MT19937) :

MT19937	PCG64	PCG64DXSM	Philox	SFC64
32-bit Unsigned Ints	96	162	160	96
64-bit Unsigned Ints	97	171	188	113
Uniforms	102	192	206	121
Normals	409	526	541	471
Exponentials	701	1071	1101	784
Gammas	207	250	266	227
Binomials	100	123	122	111
Laplaces	113	114	108	113
Poissons	103	111	115	105
Overall	159	219	225	174

Note the large increase for the normal distribution.

Challenges

I am not sure what the policy is on reproducibility of randomness across versions, but in that regard it would definitely prevent reproducibility of the randomness. But so does any change in the numpy version for example, their docs state that:

If you create a Generator with the same BitGenerator, with the same seed, perform the same sequence of method calls with the same arguments, on the same build of numpy, in the same environment, on the same machine, you should get the same stream of numbers

Implementation proposal

Check if MONAI uses any API that would not be available with the new Generators, if so create a helper function that unifies them.
Deprecate using RandomState(), but still support it in all Transforms. Enable use of the new random generator by default by using a flag.
Eventually remove support for RandomState and remove the flag as well. Switch to using the new generators by default.

I would be happy to submit a PR for this if that approach seems reasonable.

wyli · 2023-08-17T07:39:25Z

thanks for the comments, I haven't looked into the details of the backward compatibility of the random seeds, for example for this integration test, will the new generator produce the same deterministic results as RandomState, any thoughts?

MONAI/tests/test_integration_determinism.py

Lines 86 to 91 in 9ae72e4

    
           def test_training(self): 
        
               set_determinism(seed=0) 
        
               loss, step = run_test(device=self.device) 
        
               print(f"Deterministic loss {loss} at training step {step}") 
        
               np.testing.assert_allclose(step, 4) 
        
               np.testing.assert_allclose(loss, 0.536134, rtol=1e-4)

john-zielke-snkeos · 2023-08-17T08:42:18Z

I am not sure, but if this value 0.536134 depends on a specific output of np.RandomState then most likely it will not be the same. But multiple executions of the new random generator should produce the same results.

wyli · 2023-08-17T09:03:42Z

ok, the proposed plan looks good, randomstate is not actively maintained any more, the generator is the way to go. would be great to have your PR to update the code logic, I can trigger tests and verify the relevant integration results if needed https://github.com/Project-MONAI/MONAI/blob/dev/tests/testing_data/integration_answers.py cc @Nic-Ma @ericspod

john-zielke-snkeos · 2023-08-17T13:57:16Z

Ok great, I'll get to work on that! On a similar note, but more general: Has it ever been considered to support the pytorch generator as well? Especially for large random tensors (e.g. Gaussian noise on large tensors) this might make quite a difference. For reference, I created this benchmark comparing the current RandomState, the "new" numpy generator and the torch random on cpu and gpu (note that GPU times are in microseconds):

[------------ np_randomstate -----------]
| standard_normal
1 threads: ------------------------------
(250, 250, 250) | 266.2
(100, 100, 100) | 16.3

Times are in milliseconds (ms).

[------------- np_PCG64DXSM ------------]
| standard_normal
1 threads: ------------------------------
(250, 250, 250) | 167.7
(100, 100, 100) | 10.3

Times are in milliseconds (ms).

[-------------- torch_cpu --------------]
| standard_normal
1 threads: ------------------------------
(250, 250, 250) | 73.4
(100, 100, 100) | 4.4

Times are in milliseconds (ms).

[-------------- torch_gpu --------------]
| standard_normal
1 threads: ------------------------------
(250, 250, 250) | 276.6
(100, 100, 100) | 13.8

Times are in microseconds (us).

This was created using:

import torch.utils.benchmark as benchmark
import torch
import numpy as np
from functools import partial
shapes = [
      (250, 250, 250),
      (100,100,100)
      ]
np_randomstate = np.random.RandomState(0)
np_PCG64DXSM = np.random.Generator(np.random.PCG64DXSM(seed=0))
torch_gen = torch.Generator(device='cpu').manual_seed(0)
torch_gpu_gen = torch.Generator(device='cuda').manual_seed(0)
gens = {
    'np_randomstate': np_randomstate.standard_normal,
    'np_PCG64DXSM': np_PCG64DXSM.standard_normal,
    'torch_cpu': partial(torch.randn, generator=torch_gen),
    'torch_gpu': partial(torch.randn, generator=torch_gpu_gen, device='cuda')
}
results = []
for shape in shapes:
    for name, gen in gens.items():
        t = benchmark.Timer(
            stmt='gen(shape)',
            globals={'gen': gen, 'shape': shape},
            label=name,
            sub_label=str(shape),
            description='standard_normal',
            num_threads=1,
            )
        results.append(t.blocked_autorange(min_run_time=20))

compare = benchmark.Compare(results)
compare.print()

My point being that if we wanted to support these as well (ofc not by default, especially since I'm not sure how this would behave with multiprocessing etc), it might make sense to create a Wrapper class around these that exposes a common API. This could then be used to implement the switch from RandomState to the new Generator as well. The "device" parameter of the torch implementation could then be set as a instance variable (which of course requires you to adjust that manually if required).
This is ofc just an idea and I wanted to bring this out there so if sth like this is desired, we don't need to do the same work twice changing the self.R. implementations.

john-zielke-snkeos · 2023-08-17T14:20:16Z

And on another note: I noticed that there are a few usages of np.random.* in the code still. These raise some questions:

There are some usages of np.random.* in the transforms (see below). Could it be that some of them should be converted to self.R.* and for those that can't, how should we handle them? Most of them will only create single values, so I am not expecting them to be performance-critical, but I guess we should have talked about the general strategy there.

./monai/transforms/signal/array.py:276:np.random.choice
./monai/transforms/signal/array.py:357:np.random.choice
./monai/transforms/spatial/array.py:1214:np.random.randint
./monai/transforms/spatial/dictionary.py:2347:np.random.randint
./monai/transforms/spatial/dictionary.py:698:np.random.randint
./monai/transforms/utils.py:1220:np.random.random
./monai/transforms/utils.py:246:np.random.randint
./monai/transforms/utils.py:510:np.random.randint
./monai/transforms/utils.py:552:np.random.random
./monai/transforms/utils.py:607:np.random.random

There are some usages in other parts of the main code. Same question on what to do with those remains:

./monai/apps/deepedit/interaction.py:65:np.random.choice
./monai/apps/deepedit/transforms.py:159:np.random.choice
./monai/apps/deepedit/transforms.py:60:np.random.choice
./monai/apps/detection/transforms/dictionary.py:1305:np.random.randint
./monai/apps/nuclick/transforms.py:370:np.random.choice
./monai/apps/nuclick/transforms.py:377:np.random.choice
./monai/auto3dseg/analyzer.py:190:np.random.rand
./monai/auto3dseg/analyzer.py:191:np.random.rand
./monai/auto3dseg/analyzer.py:292:np.random.rand
./monai/auto3dseg/analyzer.py:374:np.random.rand
./monai/auto3dseg/analyzer.py:862:np.random.rand
./monai/auto3dseg/operations.py:119:np.random.rand
./monai/auto3dseg/operations.py:120:np.random.rand
./monai/auto3dseg/operations.py:121:np.random.rand
./monai/auto3dseg/operations.py:122:np.random.rand
./monai/auto3dseg/operations.py:62:np.random.rand
./monai/data/synthetic.py:142:np.random.random
./monai/data/synthetic.py:65:np.random.random
./monai/data/utils.py:125:np.random.randint
./monai/utils/misc.py:353:np.random.seed
./monai/visualize/utils.py:92:np.random.rand
./monai/visualize/utils.py:97:np.random.rand

What to do with usage of np.random.* in tests/**/*.py files? Numpy states that:

New code should use the normal method of a Generator instance instead; please see the Quick Start.

So it seems like there is no need to migrate those immediately, but just wanted to mention that here. Since there are around ~390 usages in tests I am not listing them here.

I found usages using the following bash command: find . -name "*.py" -exec grep -o -H -n -E 'np.random.\w+' {} \; | grep -v "RandomState" | sort | uniq

ericspod · 2023-08-17T19:50:39Z

If this is something that should be changed then we should go ahead and do so, but from the documentation RandomState isn't maintained but neither is it deprecated. It should stick around for a good long while, the speed up would be an advantage for sure but otherwise is there any other pressing need to make this change? From a casual look at the repo np.random functions are used in a lot of places, and transforms have their own RandomState object. There is a lot to be changed and unconsistent use of randomness in some places should be cleaned up too.

One other thing in the documentation is that there's no guarantee of portable behaviour between versions of Numpy, for example this means the choice of Rand* transforms in a sequence will differ between versions of numpy which will break tests expecting a certain order. Would we have to start testing per-version with Numpy as well? I don't know what the implications would be for other transforms or laziness either, or for other reproduction areas. Generative models and metrics may also be impacted. CC @atbenmurray @csudre @marksgraham

ericspod · 2023-08-17T19:53:12Z

There are some usages of np.random.* in the transforms (see below). Could it be that some of them should be converted to self.R.* and for those that can't, how should we handle them? Most of them will only create single values, so I am not expecting them to be performance-critical, but I guess we should have talked about the general strategy there.

Hi @john-zielke-snkeos, for this in particular I'd say these should use self.R.* whatever R ends up being. Elsewhere calls to np.random.* rely on an assumption of a global random state whose seed has possibly been set, so we possible move to using a Generator for these which is passed by argument.

john-zielke-snkeos · 2023-08-18T07:04:23Z

@ericspod Thanks for your reply!

for this in particular I'd say these should use self.R.* whatever R ends up being

Fixing the usage of those would introduce a breaking change regarding randomness/reproducibility, but I guess this is something that is acceptable?

For the other usages, I think both keeping them and replacing them are fine. If we end up replacing them the question is whether we should have a MONAI only random generator somewhere then. Since I think the speedup would be negligble in the cases where you only produce a small amount of randomness, I would actually be in favor of keeping those for now, and once we established a new system in the performance-critical areas worry about whether to replace the other usages.

john-zielke-snkeos · 2023-08-18T09:50:30Z

In regards to my previous comment and the implementation plan, I'd still be interested to know whether you are also in favor of introducing a wrapper class that can either use numpy.Generator or RandomState to allow for backward compatibility as well as providing extensibility later on for pytorch generators (Implementation of those should be a separate PR then).

Downside of the wrapper class is of course that it adds some maintenance overhead since we need to expose all functions of the underlying generators, and that there is probably a minimal overhead when having another level of indirection (although I would assume this to be minimal)

atbenmurray · 2023-08-18T11:26:24Z

Hi @john-zielke-snkeos, thanks for showing an interest in this!
Just a note on consistency. I don't think that we can expect back compat, but if RandomState didn't guarantee it between numpy releases, then I think that could go in the release notes and we'd be fine. @ericspod, @Nic-Ma, @wyli?

import numpy as np

seed = 12345678
r = np.random.RandomState(seed)
g0 = np.random.default_rng(seed)
g1 = np.random.Generator(np.random.PCG64(seed))
g2 = np.random.Generator(np.random.MT19937(seed))

print(r.uniform(0, 1000000))
print(g0.uniform(0, 1000000))
print(g1.uniform(0, 1000000))
print(g2.uniform(0, 1000000))

245804.23382490256
958338.0019665658
958338.0019665658
509569.7044655211

atbenmurray · 2023-08-18T11:30:08Z

Second note: since we can't just remove RandomState without deprecating for multiple versions, we might need to have a wrapper as discussed. I would suggest that, as Generator is the supported numpy rng class going forward, that we might want to have Randomizable wrap RandomState in a Generator adaptor rather than the other way around.

vikashg · 2024-01-04T22:30:04Z

@ericspod @KumoLiu can we look at this PR ?

ericspod · 2024-01-06T04:30:33Z

Hi @vikashg we had discussed this change in the past within core meetings and I think we need to return to it and consider what the approach is. I've added it to the backlog project for now but let's bring this up again at the next meeting and work towards some agreement on what the changes will be. It's going to be a large refactor so we had wanted to get 1.4 items done first and finalise the lazy transforms.

vikashg · 2024-01-10T14:28:40Z

ok sounds good. Thanks @ericspod

wyli added the Feature request label Aug 10, 2023

wyli assigned john-zielke-snkeos Aug 18, 2023

wyli mentioned this issue Aug 18, 2023

consistent usages of np.RandomState in the code #6888

Open

john-zielke-snkeos linked a pull request Aug 23, 2023 that will close this issue

WIP: Move to numpy.Generators #6899

Draft

7 tasks

vikashg assigned ericspod Jan 4, 2024

johnzielke mentioned this issue Mar 26, 2024

Make Monai (Random) Transforms work in a multi-threaded environment #7582

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use numpy random generator instead of randomstate #6854

use numpy random generator instead of randomstate #6854

wyli commented Aug 10, 2023

john-zielke-snkeos commented Aug 17, 2023

wyli commented Aug 17, 2023

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

wyli commented Aug 17, 2023

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

ericspod commented Aug 17, 2023

ericspod commented Aug 17, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 18, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 18, 2023

atbenmurray commented Aug 18, 2023 •

edited

Loading

atbenmurray commented Aug 18, 2023

vikashg commented Jan 4, 2024

ericspod commented Jan 6, 2024

vikashg commented Jan 10, 2024

use numpy random generator instead of randomstate #6854

use numpy random generator instead of randomstate #6854

Comments

wyli commented Aug 10, 2023

john-zielke-snkeos commented Aug 17, 2023

Challenges

Implementation proposal

wyli commented Aug 17, 2023

john-zielke-snkeos commented Aug 17, 2023 • edited Loading

wyli commented Aug 17, 2023

john-zielke-snkeos commented Aug 17, 2023 • edited Loading

john-zielke-snkeos commented Aug 17, 2023 • edited Loading

ericspod commented Aug 17, 2023

ericspod commented Aug 17, 2023 • edited Loading

john-zielke-snkeos commented Aug 18, 2023 • edited Loading

john-zielke-snkeos commented Aug 18, 2023

atbenmurray commented Aug 18, 2023 • edited Loading

atbenmurray commented Aug 18, 2023

vikashg commented Jan 4, 2024

ericspod commented Jan 6, 2024

vikashg commented Jan 10, 2024

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 17, 2023 •

edited

Loading

ericspod commented Aug 17, 2023 •

edited

Loading

john-zielke-snkeos commented Aug 18, 2023 •

edited

Loading

atbenmurray commented Aug 18, 2023 •

edited

Loading