Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify meaning of initializationPrior* #587

Open
dweindl opened this issue Sep 17, 2024 · 8 comments
Open

Clarify meaning of initializationPrior* #587

dweindl opened this issue Sep 17, 2024 · 8 comments

Comments

@dweindl
Copy link
Member

dweindl commented Sep 17, 2024

Initially, initializationPriorType and initializationPriorParameters were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?

@paulflang
Copy link
Contributor

Reminds me that that I once found this a little bit confusing. In my mind, it would make sense to treat the following explicitly separately

  • definition of the objective function
  • hints for optimizer
  • hints for plotting routines

@dweindl
Copy link
Member Author

dweindl commented Sep 23, 2024

  • definition of the objective function

This is already separated. Those are the objectivePriorType, objectivePriorParamters fields.

  • hints for optimizer

This would be initializationPriorType and initializationPriorParameters, which I think need further clarification. Is this more like an optional hint, or at which stages do those have to be respected?

  • hints for plotting routines

Everything for plotting is the visualization table, but this is currently independent of any prior distributions.

@paulflang
Copy link
Contributor

paulflang commented Sep 23, 2024

Those are the objectivePriorType, objectivePriorParamters fields.

I was not talking about those columns specifically, more about my experience when I was new to optimization and first came across PEtab. I was not quite sure how to cast the data I had into an objective function (should I just use least squares?), but after reading the format specification, thinking about it and reading it again, it all started to make sense - except for the two initializationPrior* columns. I could not figure out how they affect the objection function. At some point I concluded that they are probably just there for reasons that don't affect me (remember, I was using eSS), so I just ignored them. Of course, there were also datasetId and replicateId, but for those it was more obvious that they are just for visualization purposes. Still, I'm not sure what (if anything) to do here. Only thing that came to my mind is prefixing optimization hint columns with oh:, and plotting routine columns (outside the visualization table) with vis:.

@dilpath
Copy link
Member

dilpath commented Oct 1, 2024

Initially, initializationPriorType and initializationPriorParameters were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?

For me, whenever a new point is sampled in an uninformed way, then the initializationPrior* should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.

I also agree with @paulflang that the columns that define the objective function should be obvious, so then *Prior* in initializationPrior* is suboptimal. Alternative: optimizerSampling*.

This information is useful but could also be shifted to the PEtab Result format. Currently, there are no (draft) guidelines for whether certain optimizer information is better suited in optional columns in PEtab, or as values in the PEtab Result.

@dweindl
Copy link
Member Author

dweindl commented Oct 1, 2024

For me, whenever a new point is sampled in an uninformed way, then the initializationPrior* should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.

This sounds reasonable, in principle. However, my problem is, that for certain global optimizers it will be difficult to achieve that. They usually just take some box constraints and then sample randomly inside the box.
Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored, I am wondering whether initializationPrior* would then rule out those optimizers. Either way is fine for me, but I think it would be good to clarify that.

@dilpath
Copy link
Member

dilpath commented Oct 1, 2024

Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored

This means we would need multiple PEtab problems, one per optimizer type (local/global). It also means one would need to use the same optimizer type to reproduce a result with the original PEtab problem -- otherwise, manual changes would be needed to have a valid PEtab problem. I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it. Or, we move/copy this to the PEtab Result.

@dweindl
Copy link
Member Author

dweindl commented Oct 2, 2024

This means we would need multiple PEtab problems

:-/

I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it.

Agreed, but then it should be made clear in the documentation that it's just some hint that optimizers may or may not use. Possibly also in the column name.

Or, we move/copy this to the PEtab Result.

That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.

@dilpath
Copy link
Member

dilpath commented Oct 3, 2024

Or, we move/copy this to the PEtab Result.

That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.

Agreed re: input vs. output. My message was coming from the perspective "the PEtab Result aims to store sufficient information for reproducibility of a result", rather than "the PEtab Result should only contain the result". i.e. it's currently planned that the PEtab Result contains inputs like optimizer hyperparameters and other tool-specific settings. But fine to leave out of this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants