-
Notifications
You must be signed in to change notification settings - Fork 51
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrapping: should it be at the alchemlyb level, or pymbar level? #80
Comments
If there's a common function that we can use for "everything" (and just plugin the estimator) then alchemlyb would be a good place, I think – along the lines that we only have to write and test the code once. I generally like building blocks that I can freely combine. Something like def bootstrapped(data, estimator):
...
return mean, error Alternatively, we could hide the machinery in the One advantage of doing it at the alchemlyb level is that it might not be difficult to run alchemlyb with dask (essentially, use the dask.DataFrame) and then the bootstrapping can be parallelized without effort. A while ago @dotsdl played around with alchemlyb and dask – I can't quite remember how much would need to be changed. |
That might be a good idea. It could return a dictionary of all the bootstrapped results, along with the uncertainty estimate. I'll think about how to organize this. One issue is that the data will look different with each estimator, thus requiring fairly different conditionals inside the bootstrapped data. Also, if one was analyzing K states, calculating the free energy with BAR executed pairwise, one would want to bootstrap over the entire data set of K states; i.e. you would need to bootstrap the entire procedure, not over a single estimator. |
@mrshirts do you have a paper or writeup you can point to for this approach? I'd be happy to prototype something. We may be able to steal design inspiration from |
So, I don't really have a good simple paper. http://www.alchemistry.org/wiki/Analyzing_Simulation_Results#Bootstrap_Sampling is a good summary. I agree that something like After the bootstrap sampling with replacement, everything else is pretty trivial. You calculate your function on each of the bootstrapped. You then have a set of resullts (could be multivalue return, and you can simply return a list of all the answers. You can optionally return various statistical measures of this list for each of the results - mean, standard deviation, confidence intervals. One could make decorrelation of the data sets part of the algorithm, but it would perhaps be more modular to do the decorrelation as a separate step. |
Dear alchemlyb team! |
Hey all, after discussions with @wildromi, I've committed to working on this issue over the next two weeks. I expect the first iteration to be usable but probably not the approach we end up with. I'll post a WIP PR as soon as I can. |
Hi, David- I'd love to talk some more about this, as I've been dealing with similar setups for a while. Shoot me an email at the CU email and we can strategize some more? A key is bootstrapping simultaneously over multiple time series, for example. |
@mrshirts sent! I'm looking forward to leveraging your experience to jumpstart the approach. |
@dotsdl: take a look at https://github.com/choderalab/pymbar/blob/pmf/pymbar/pmf.py and look at lines 590 to 615 to get a sense at how bootstrapping works in a complicated case (in this case, calculating a potential of mean force) |
I met with @mrshirts yesterday, and we aligned on an approach. I have started a WIP PR on #94. There is a list of things to do yet, but we have the start of our implementation. You can check out how things work so far in this gist. Comments welcome! Please don't use this in production work yet until we have tests ensuring that |
The gist for #94 has been updated; it requires components of #98, which can be played with on this branch. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Should bootstrapping be implemented at the alchemlyb level, or pymbar level? For MBAR, it would be better at the pymbar level, since it can be easily encapsulated (user doesn't have to worry about it), and one can request either uncertainty estimate.
For BAR over several states, then the bootstrapping needs to be done at the level ABOVE the BAR call, since we need to bootstrap all of the data simultaneously before feeding it into BAR. Same for EXP applied to a string of states.
Thoughts?
The text was updated successfully, but these errors were encountered: