Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A system for managing Model/Agent default values and ranges #2268

Open
EwoutH opened this issue Sep 2, 2024 · 16 comments
Open

A system for managing Model/Agent default values and ranges #2268

EwoutH opened this issue Sep 2, 2024 · 16 comments
Labels
feature Release notes label

Comments

@EwoutH
Copy link
Member

EwoutH commented Sep 2, 2024

Currently, I need to specify default values and sometimes ranges on multiple places:

  • As default model variables (for examples etc.)
  • If I want to visualize, I need to define a min, max, step and a default value
  • In batch_run I'm defining discrete ranges of variables
  • Our benchmarks also have custom parameters.
  • If I want to run in something like the EMAworkbench I'm again defining ranges

If seems like Mesa could benefit from some way to define default values and/or ranges in a better way that can be used throughout different components.

@EwoutH EwoutH added the feature Release notes label label Sep 2, 2024
@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

My initial idea playing with this is something like:

# Define dataclass somewhere (called ParameterSpec in this case)
from dataclasses import dataclass

@dataclass
class ParameterSpec:
    default: any
    min_val: any = None
    max_val: any = None
    step: any = None
# Modify the Mesa Model to initialize this ParameterSpec's default value,
# if no other value is passed
class Model():
    def __init__(self, **kwargs):
        ....
        # Iterate through each attribute defined in the class
        for key, value in self.__class__.__dict__.items():
            if isinstance(value, ParameterSpec):
                # Set the value from kwargs or use the default from ParameterSpec
                setattr(self, key, kwargs.get(key, value.default))
# Allow the users to define class-level variables in ParameterSpec form
class MyModel(BaseModel):
    wealth_spec = ParameterSpec(100, min_val=50, max_val=150)
    area_spec = ParameterSpec(10, min_val=5, max_val=20)

    def __init__(self, **kwargs):
        super().__init__(**kwargs)  # Calls BaseModel.__init__
        # Additional initialization code here
# The model can now be initialized without any input values
model_default = MyModel()
print(model_default.wealth)  # Output will be 100
print(model_default.area)    # Output will be 10

# But they can also be easily overwritten
model_custom = MyModel(wealth=120, area=15)
print(model_custom.wealth)  # Output will be 120
print(model_custom.area)    # Output will be 15

The only thing we now required is the super().__init__(**kwargs) to be called with **kwargs.

Now:

  • The visualisation can use all 4 parameter spec numbers
  • batch_run can use min, max and step (by default)
  • The examples can use a default value
  • The EMAworkbench can use the min and max value

Edit: **kwargs only needs to be inputted if one or more ParameterSpec instances are defined. So for existing models nothing changes.

@Corvince
Copy link
Contributor

Corvince commented Sep 2, 2024

I really like the basic idea! I think one of the challenges is to catch all basic parameter types. You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.

Of course theoretical any python can be an input parameter. Maybe there is an elegant way of handling this?

Also for numerical values, not the highest priority, but maybe already think about how we could handle non-linear scaling (maybe you have a range between 1e3 and 1e6 or the like)

But will be super useful

@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

Thanks for you insights! I was also churning om some similar things

You outlined numeric values, but we should also handle strings (maybe default and list of options) and Boolean values.

We could add a parameter other_values for strings, booleans, objects, etc. Then that could just be an (unordered) list of options.

Maybe an explicit parameter_type would also be useful. Or subclass ParameterSpec to ParameterSpecInt, ParameterSpecBool, etc. (better names needed). We might be able to take some inspiration from the EMAworkbench's parameters.

how we could handle non-linear scaling

Also thinking about this. Ideally, step_size could not only be a fixed step, but also a multiplier or even exponent. Maybe a scaling parameter could be useful that has options for linear, logarithmic, exponential, quadratic, etc.

Finally, how do you see this integrating with visualisation? It might remove a lot of the boilerplate you have to write in app.py. Any problems that might occur with this approach?

For non-numeric, it might just be able to be a drop-down menu or something like that.

@quaquel
Copy link
Member

quaquel commented Sep 2, 2024

I like the idea. Drawing on my experience with the workbench, you would need something like the following:

  • continuous ranges, so, e.g., 1.0 - 12.5; this is a RealParameter in the workbench
  • integer ranges, so, e.g., 1 - 10; this is an IntegerParameter
  • ordered sets, so, e.g., A, B, C; this does not exist in the workbench but can be handled by IntegerParameter
  • categorical variables/unordered sets, so, e.g., A, 5, some_object, this is a CategoricalParameter

Booleans can be useful, but in the workbench they are are subclassed from IntegerParameter.

Subclassing from ParameterSpec might be the best idea. It forces the user to be explicit about the nature of each parameter and how it can be handled. In particular, ordered vs. unordered sets is an important distinction.

I would be hesitant to include the step_size. At least from an experimental design point of view, this is not a property of the parameter space but a choice by the analyst in how she wants to sample points from the parameter space.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

What I’m also curious about, if we encounter this problem, and the workbench does, is it encountered by other simulation libraries? How do they solve it? Should there be a general (Python) solution?

IMG_1395

@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

https://scientific-python.org/specs/ might be the place if we want to take the high-effort route.

@quaquel
Copy link
Member

quaquel commented Sep 3, 2024

Let's not wait for that if we think this is a useful idea of MESA. If a default SPEC emerges for this, we might choose to start following that. I doubt, however, that it will come because the nature of ABMs is quite different from many other simulation problems in that for those you typically only need real parameters.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 3, 2024

We could start making sure it’s compatible with the workbench.

I would be hesitant to include the step_size.

Decoupling sampling strategies from the parameter ranges seems to be a good idea indeed.

@adamamer20
Copy link
Collaborator

This is a great idea!
For reference, Optuna (a hyperparameter optimization framework) uses a similar approach for defining parameter search spaces. See an example here: Optuna Pythonic Search Space
What if we allowed SciPy distributions for numerical parameters? This could be particularly beneficial when running batch_run. You could get more informative results if you don't do a full sweep search and instead specify a number of runs, especially if you know certain parameters follow a specific distribution.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 5, 2024

Okay, I thought about this a bit more.

Basically we have the four main for data types:

  • Continuous values (return: float)
  • Discrete values (return: int)
  • Ordered sets (return: Any)
  • Categorical (unordered) sets (return: Any)

You could add more details with Stevens's typology of level of measurement, but (for now) I don't think that's necessary. Boolean is a bit weird, because it can basically be a special case of either ordered or categorial data.

So what's the bare minimum you need to sample each?

Categorical (unordered) sets

  • categories
  • probabilities

Ordered sets

  • categories (sorted)
  • probabilities

Taking these two, we can already observe they align quite well. They can probably be one class, with an boolean attribute for ordered.

Discrete (interval) values

  • Range (min and max)
  • Interval
  • Distribution type
    • Distribution parameters

Continuous values (return: float)

  • Range (min and max)
  • Distribution type
    • Distribution parameters

These two also are similar, and probably can be grouped in a class. That means we would have something like a NumericalParameters and CategoricalParameters subclass.

@dataclass
class ParameterSpec:
    """Base class for parameter specifications."""
    description: str = ""

@dataclass
class NumParamSpec(ParameterSpec):
    """Represents numerical parameters, including continuous and discrete types."""
    min_val: float | None = None
    max_val: float | None = None
    is_discrete: bool = False  # False means continuous (sampler should return float), True means discrete (should return int)
    distribution_type: str | None = None  # Optional field to specify the distribution type
    distribution_parameters: dict | None = None  # Parameters for the distribution (e.g., mean, std)

@dataclass
class CatParamSpec(ParameterSpec):
    """Represents categorical parameters, both ordered and unordered sets."""
    categories: List[Any]
    probabilities: List[float] | None = None  # Probabilities for each category, if any
    is_ordered: bool = False  # False means unordered, True means ordered

So, I think this satisfices sampling. Note that distribution_parameters could contain things like loc, scale and shape, or other variables necessary.

Then there's the "practical modelling" size. I see basically three important use cases here:

  1. A default value would be really useful for model development, keeping stuff reproducible in the beginning. There are 2.5 ways a sampler could handle this:
    • Implicit: There's some convention about what's the default value. The first value in a list, the middle of the range, etc.
    • Explicit: There has to be a default defined.
    • Hybrid: Implicit if no default is passed, explicit if it is.
      Since samplers might want to decide this for themselves, I think a default key could be useful in all cases.
  2. For numerical parameters, when sampling from a distribution, you might either want to do a hard clip using the min and max value, or allow then outside this range. This could be a boolean.
  3. For numerical parameters, to visualize them properly, something like bins, bin_size step_size might be useful to easily create sliders (for input) and plots (for output). Knowing if the parameters is meant to scale linear, logarithmic, exponential or otherwise might also be useful to know.
  4. Booleans can be super convenient and thus should be supported also in some way. Maybe as an separate class.

My thoughts so far. Curious what everybody thinks!

@EwoutH
Copy link
Member Author

EwoutH commented Sep 5, 2024

Some resources linked from @tupui and @ConnectedSystems over in SALib/SALib#634:

Might be interesting if we can learn something from them!

@EwoutH
Copy link
Member Author

EwoutH commented Sep 9, 2024

Okay, to move forward:

  1. Centralize discussion to one place. discuss.scientific-python.org might be the most fitting, but GitHub might be more visible.
  • Maybe set up a call schedule or something
  1. Get an initial set of requirements, using the insights from various libraries
  • If needed at this stage, get more maintainers / core developers of libraries involved
  1. Get consensus on a conceptual-level solution
  2. Come up with implementation solution (how/where, API, etc.)
  3. Roll out and start testing
  4. Iterate
  5. (optionally) make it a formal SPEC

What does everybody think? What am I missing or should be different?

CC @tupui and @ConnectedSystems

@EwoutH
Copy link
Member Author

EwoutH commented Sep 11, 2024

Let's centralize the discussion to Scientific Python, so we can get all ideas in one place:

I will make a little introduction there with my thoughts on the problem from the perspective of Mesa. I would love the same from other maintainers from other libraries!

@Corvince
Copy link
Contributor

@EwoutH Since I saw you are working on a new batch-runner, let me just very quickly outline some of the possibilities with param.
So first of all, this is how you could specifiy parameters with param

class MyModel:
     n = param.Integer(10, bounds=(0, None), soft_bounds=(0, 100), step=5, help="Number of agents")

That is we define the number of agents with a default value (10), some hard bounds (must be positive, raises an exception if not), some soft bounds (should be betwenn 0 and 100, e.g. can be picked up by a slider), some step size (not enforced, but can be used for parameter sweeps or again a GUI), and a short help text.

Now the interesting part relating to batch running is that besides setting this value directly (model.n = 50), we can also set it to a function that resolves to an integer (model.n = random.randint(0, 100)). This of course allows to easily do parameter sweeps. They even provide a whole ranger of numbergen functions that sample from different distributions. So that might be worth considering

https://param.holoviz.org/user_guide/Dynamic_Parameters.html

@EwoutH
Copy link
Member Author

EwoutH commented Sep 24, 2024

Thanks, I think param is almost exactly what this issue was intended to produce. Turns out it already exists, which saves so much time.

I’m going to try to integrate it and see how it works.

@EwoutH
Copy link
Member Author

EwoutH commented Oct 9, 2024

Python has a typing.ParamSpec class, which as of Python 3.13 now can also have a default value. Might be interesting to look further into, and maybe subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label
Projects
None yet
Development

No branches or pull requests

4 participants