Release 0.4.0a1 · tskit-dev/tsinfer

##Alpha release of tsinfer 0.4.0

Features

tsinfer now supports inferring data from an vcf-zarr dataset. This allows users
to infer from VCFs via the optimised and parallel VCF parsing in bio2zarr.
The VariantData class can be used to load the vcf-data and be used for inference.
vcf-zarr sample_ids are inserted into individual metadata as variant_data_sample_id
if this key does not already exist.

Breaking Changes

Remove the uuid field from SampleData. SampleData equality is now purely based
on data. ({pr}748, {user}benjeffery)

Performance improvements

Reduce memory usage when running match_samples against large cohorts
containing sequences with substantial amounts of error.
({pr}761, {user}jeromekelleher)
truncate_ancestors no longer requires loading all the ancestors into RAM.
({pr}811, {user}benjeffery)
Reduce memory requirements of the generate_ancestors function by providing
the genotype_encoding ({pr}809) and mmap_temp_dir ({pr}808) options
({user}jeromekelleher).
Increase parallelisation of match_ancestors by generating parallel groups from
their implied dependency graph. ({pr}828, {issue}147, {user}benjeffery)

Provide feedback