Skip to content

Commit

Permalink
Merge pull request #424 from nspope/docs-cite-singer
Browse files Browse the repository at this point in the history
Make sure to cite SINGER in docs for rescaling idea
  • Loading branch information
hyanwong authored Jul 27, 2024
2 parents 5b4c8d8 + db20d23 commit 1956970
Showing 1 changed file with 19 additions and 11 deletions.
30 changes: 19 additions & 11 deletions docs/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Pros
: Old nodes do not suffer from time-discretisation issues caused by forcing
bounds on the oldest times
: Iterative updating properly accounts for cycles in the genealogy
: No need to specify prior times
: No need to specify node-specific priors; a mixture "prior" (fit by expectation-maximization) is used to regularise the roots.
: Can account for variable population sizes using rescaling

Cons
Expand Down Expand Up @@ -118,15 +118,23 @@ ts = tsdate.date(input_ts, mutation_rate=1e-8, progress=True)
(sec_rescaling)=
#### Rescaling

During each EP step, the `variational_gamma` method implements a further process
that we call *rescaling*, and which can help to deal with the effects of variable population
size though time. Basically, time is broken up into a number of intervals, and times within
intervals are simultaneously scaled such that the expected density of mutations along each
path from a sample to the root best matches the mutational density predicted from the
user-provided mutation rate. The number of intervals can be specified using the
`rescaling_intervals` parameter. If set to 0, no rescaling is performed; this means that
dates may be inaccurately estimated if the dataset comes from a set of samples with a complex
demographic history.
During each EP step, the `variational_gamma` method implements a further
process called *rescaling*, and which can help to deal with the effects of
variable population size though time. This is based on an algorithm introduced
by the ARG inference software
[SINGER](https://doi.org/10.1101/2024.03.16.585351) (Deng et al 2024) that
rescales node ages by matching observed and expected segregating sites within
time windows.
Basically, time is broken up into a number of intervals, and times within
intervals are simultaneously scaled such that the expected density of mutations
along each path from a sample to the root best matches the mutational density
predicted from the user-provided mutation rate. The number of intervals can be
specified using the `rescaling_intervals` parameter. If set to 0, no rescaling
is performed; this means that dates may be inaccurately estimated if the
dataset comes from a set of samples with a complex demographic history.
`tsdate` uses a modified version of Deng et al's algorithm that works on gamma
natural parameters rather than point estimates, and that is not biased by the
artefactual polytomies introduced by `tsinfer` for the sake of compression.

TODO: describe the rescaling step in more detail. Could also link to [the population size docs](sec_popsize)

Expand Down Expand Up @@ -174,4 +182,4 @@ have no mapped mutations (e.g. in the centromere), which can be removed by

The `maximization` approach is slightly less accurate empirically,
and will not return true posteriors, but is theoretically robust and
additionally is always numerically stable.
additionally is always numerically stable.

0 comments on commit 1956970

Please sign in to comment.