Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dependency on tsdate #926

Open
jeromekelleher opened this issue Jun 3, 2024 · 3 comments
Open

Add dependency on tsdate #926

jeromekelleher opened this issue Jun 3, 2024 · 3 comments

Comments

@jeromekelleher
Copy link
Member

As part of the changes for tsinfer 1.0, I think we should add the ability to date the inferred trees by adding a dependency on tsdate. The motivation for doing this is that it's reasonably involved doing the full tsinfer+tsdate pipeline correctly, requiring multiple pre and postprocessing steps to get the best performance. As most people probably want to date their ARGs, then we should make it easy for them to do it as well as possible.

Similarly for doing multiple rounds of tsinfer based on inferred dates.

How we structure the APIs etc is something to figure out, but I'm sure we can manage it.

@hyanwong
Copy link
Member

hyanwong commented Jun 3, 2024

Yep, discussing just now with @jeromekelleher and this seems sensible to me.

One suggestion is that tsinfer.date(ts_in, ts_out) is another step, just like tsinfer.match_samples(sd, ts_in). I think by default this should carry out something internally like:

ts_tmp = tsdate.preprocess_ts(ts_in)
return tsdate.date(ts_tmp, mutation_rate=mu)

Then tsinfer.infer() would be "shorthand" for the following 4 steps:

  1. tsinfer.generate_ancestors
  2. tsinfer.match_ancestors
  3. tsinfer.match_samples
  4. tsinfer.date # n.b. we could call this tsinfer.set_times() if we want to distinguish it from tsdate.date?

We need to think about the API, e.g. if we require a mutation rate (or if it is absent, whether we simply return an undated tree sequence).

@jeromekelleher
Copy link
Member Author

We need to think about the API, e.g. if we require a mutation rate (or if it is absent, whether we simply return an undated tree sequence).

That would be a good way to maintain backward compatibility - if mutation_rate is suppled to tsinfer.infer then the additional dating step is done, otherwise same as we have now.

@benjeffery
Copy link
Member

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants