0.3.0 - Bugfix and maintenance release
Read https://tskit.dev/news/20221025-tsinfer-0.3.0.html for a more detailed explanation of this update.
Features
-
When calling sample_data.add_site() the ancestral state does not need to be the first allele (index 0): alternatively, an ancestral allele index can be given (and if MISSING_DATA, the ancestral state will be imputed). (#718, #686 @hyanwong)
-
The CLI interface now allows recombination rate (or rate maps) and mismatch ratios to be specified (#731, #435 @hyanwong)
-
The calls to match-ancestors and match-samples via the CLI are now logged in the provenance entries of the output tree sequence (#732 and 741, #730 @hyanwong)
-
The CLI interface allows --no-post-process to be specified (for details of post- processing, see “Breaking changes” below) (#727, #721 @hyanwong)
-
matching routines warn if no inference sites (#685, #683 @hyanwong)
Fixes
-
sample_data.subset() now accepts a sequence_length (#681, @hyanwong)
-
verify no longer raises error when comparing a genotype to missingness. (#716, #625, @benjeffery)
Breaking changes:
-
The simplify parameter is now deprecated in favour of post_process, which prior to simplification, removes the “virtual-root-like” ancestor (inserted by tsinfer to aid the matching process) then splits the ultimate ancestor into separate pieces. If splitting is not required, the post_process step can also be called as a separate function with the parameter split_ultimate=False (#687, #750, #673, @hyanwong)
-
Post-processing by default erases tree topology that exists before the first site and one unit after the last site, to avoid extrapolating into regions with no data. This can be disabled by calling post_process step as a separate function with the parameter erase_flanks=False (#720, #483, @hyanwong)
-
Inference now sets time_units on both ancestor and final tree sequences to tskit.TIME_UNITS_UNCALIBRATED, stopping accidental use of branch length calculations on the ts. (#680, @hyanwong)