Skip to content

0.4.0a1

Pre-release
Pre-release
Compare
Choose a tag to compare
@benjeffery benjeffery released this 27 Jul 00:53
· 20 commits to main since this release

##Alpha release of tsinfer 0.4.0

Features

  • tsinfer now supports inferring data from an vcf-zarr dataset. This allows users
    to infer from VCFs via the optimised and parallel VCF parsing in bio2zarr.
  • The VariantData class can be used to load the vcf-data and be used for inference.
  • vcf-zarr sample_ids are inserted into individual metadata as variant_data_sample_id
    if this key does not already exist.

Breaking Changes

  • Remove the uuid field from SampleData. SampleData equality is now purely based
    on data. ({pr}748, {user}benjeffery)

Performance improvements

  • Reduce memory usage when running match_samples against large cohorts
    containing sequences with substantial amounts of error.
    ({pr}761, {user}jeromekelleher)

  • truncate_ancestors no longer requires loading all the ancestors into RAM.
    ({pr}811, {user}benjeffery)

  • Reduce memory requirements of the generate_ancestors function by providing
    the genotype_encoding ({pr}809) and mmap_temp_dir ({pr}808) options
    ({user}jeromekelleher).

  • Increase parallelisation of match_ancestors by generating parallel groups from
    their implied dependency graph. ({pr}828, {issue}147, {user}benjeffery)