0.4.0a1
Pre-release
Pre-release
##Alpha release of tsinfer 0.4.0
Features
tsinfer
now supports inferring data from anvcf-zarr
dataset. This allows users
to infer from VCFs via the optimised and parallel VCF parsing inbio2zarr
.- The
VariantData
class can be used to load the vcf-data and be used for inference. vcf-zarr
sample_ids
are inserted into individual metadata asvariant_data_sample_id
if this key does not already exist.
Breaking Changes
- Remove the
uuid
field from SampleData. SampleData equality is now purely based
on data. ({pr}748
, {user}benjeffery
)
Performance improvements
-
Reduce memory usage when running
match_samples
against large cohorts
containing sequences with substantial amounts of error.
({pr}761
, {user}jeromekelleher
) -
truncate_ancestors
no longer requires loading all the ancestors into RAM.
({pr}811
, {user}benjeffery
) -
Reduce memory requirements of the
generate_ancestors
function by providing
thegenotype_encoding
({pr}809
) andmmap_temp_dir
({pr}808
) options
({user}jeromekelleher
). -
Increase parallelisation of
match_ancestors
by generating parallel groups from
their implied dependency graph. ({pr}828
, {issue}147
, {user}benjeffery
)