-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralization of the seeded random number generator #2021
Conversation
Performance benchmarks:
|
|
Performance benchmarks:
|
@@ -25,6 +24,8 @@ | |||
from mesa.model import Model | |||
from mesa.space import Position | |||
|
|||
from mesa.rng import RandomDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mesa.random
is less surprising than mesa.rng
. People are used to CPython's random
module, and numpy.random
.
I'm fine with the change since it doesn't break compatibility, and I can always swap with NumPy RNG when needed. NumPy RNG is much more performant at pre-emptively producing lots of random numbers at once. I see it to be CPython's |
The exact naming is the least of my concerns at the moment. So if others agree, I am fine with renaming. I added a third point above. I would appreciate input from anyone on all three. |
I suppose it is the intended behavior anyway when you don't seed each model instances.
Why is the time and steps accessed via the model ( Edit: clarify last paragraph. |
I don't focus on this in this PR. I started this because of the need for random in CellCollection, AgentSet, and DiscreteSpace and its subclasses. We can generalize this solution if we discover that many classes need access to time and step. |
|
||
def __init__(self, agents: Iterable[Agent], model: Model): | ||
def __init__(self, agents: Iterable[Agent], model: Model, random=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR means we can remove model here, right? That would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agents might still need to access the clock info (time and steps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is within the AgentSet, not the agent.
This is an interesting caveat of the benchmarks script @EwoutH . It probably means the benchmark action should always merge main into the PR, to compare the actual differences.
Can you give more details on why the test is failing and what are the implications? I am unsure at the moment
Thats an interesting caveat. I think if someone creates multiple model instances they would expect to the models to lead to different outcomes, if they don't explicitly set the seed to the same value. So not too happy with this one. Although wait, this shouldn't actually matter. If you run the models after each other, they should still give different results, right? So what are the possibly downsides of this? |
The time and steps are similar to the rng in that any constituent objects need to be able to access values from the "admin of the Matrix". I prefer that the time, steps, and the rng to be accessed via the same method, for consistency. This used to be from the model object.
There are at least 3, which are plenty enough: the current data collector, the current batch_run, and the Poisson activation scheduler / any discrete event scheduler. |
The test that is failing is in test_time.py: def test_shuffle_shuffles_agents(self):
model = MockModel(shuffle=True)
model.random = mock.Mock()
assert model.random.shuffle.call_count == 0
model.step()
assert model.random.shuffle.call_count == 1 What happens is that in MockModel, the default rng is set. Next, we assign a mock to
I think this requires a more detailed explanation than I currently have time for. There is no problem creating model1, running it; creating model2, running it; etc. So, for batch runs and replications, there is no problem. You can get a problem if you have two models running in a lockstep way. In short, you can get into situations where the model's behavior becomes not reproducible. For example model1 = Model(seed=42)
model1.step()
model2 = Model(seed=None) # changes the default rng
...
# which rng is now being used? Random(42) or Random(None).
model1.step()
model2.step() # same question |
Thanks for the clarification, I'll think about that |
I agree in principle. But please let's keep this PR focussed on addressing the issue with random. Once this PR is complete and we have a solution, I'll happily open another PR for time etc. At the moment, I am still not entirely sure the presented approach is the way forward because of the issues I have raised. |
I am closing this PR. I have come around to the position that passing around the random number generator explicitly is the prefered solution. This is partly informed by the debates surrounding [spec 7] (https://scientific-python.org/specs/spec-0007/) |
This is a proposed solution for the issue raised in #1981. Currently, the seeded random number generator resides in the model. Any other class that might need to generate random numbers (e.g., agent, AgentSet, the tentative CellCollection, the various spaces) thus need a reference to the model in order to use the seeded random number generator.
This PR offers a complementary solution that can be used throughout MESA. Rather than using a Singleton (as suggested in #1981), I have modeled it on how logging works. So, we have a new rng module with a global variable containing the default seeded random instance. The model sets it. A simple
get_default_rng
function can access the default random number generator.Also, many current classes already have a random property to get the random number generator from the model. I propose to generalize this by adding a descriptor class (the proper use of a descriptor this time). In short, this descriptor will retrieve the default random number generator when it is set to None.
For an example of how it is all used, check the modifications in the agent module.