Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on trees with internal samples #153

Open
hyanwong opened this issue Dec 9, 2021 · 0 comments
Open

Performance on trees with internal samples #153

hyanwong opened this issue Dec 9, 2021 · 0 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Dec 9, 2021

Chatting to @molpopgen, he says:

When there are sufficient numbers of ancient samples, doing anything with trees is terribly inefficient, and you can recover literally orders of magnitude by simplifying to each time point for which there are ancient samples.
An example that I run into a lot, and I'm sure that @petrelharp has, his simulations where you remember everyone for some period of time.
In those cases, performance regresses from logarithmic to linear, and there's a tremendous amount of time spent updating information about nodes that have nothing to do with your current time slice.
In a simulation, most ancient samples will tend to be internal. And many are not ancestral to the final generation.
Here's a figure I made yesterday based on a massively polygenic simulation. There are millions of internal nodes making up the time series. The plot takes over an hour to make if you don't simplify to each time point separately.

image
20,000 nodes per time point by 100 time points.
The D statistic is calculated from a random sample of 50 diploid individuals. You basically have to simplify to that sample in order for the figure to be possible.
If you have few samples, it's closer to kind of logarithmic. If you have lots, it's quite poor.... like I said. this is an extremely common case. I'm pretty sure that Peter does this routinely. And I certainly do.

This seems like prime material for one of the "High performance" tutorials (see #151 ). There's an open issue on it in molpopgen/fwdpy11#394 but I guess this is a general tree sequence issue and so might well be a candidate for incorporation here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant