Release DeepVariant 1.2.0 · google/deepvariant

The DeepVariant v1.2 release contains the following major improvements:

A major code refactor for make_examples better modularizes common components between DeepVariant, DeepTrio, and potential future applications. This enables DeepTrio to inherit improvements such as --add_hp_channel (introduced to the DeepVariant PacBio model in v1.1; see blog), improving DeepTrio’s PacBio accuracy.
The DeepVariant PacBio model has substantially improved accuracy for PacBio Sequel II Chemistry v2.2, achieved by including this data in the training dataset.
We updated several dependencies: Python version to 3.8, TensorFlow version to 2.5.0, and GPU support version to CUDA 11.3 and cuDNN 8.2. The greater computational efficiency of these dependencies results in improvements to speed.
In the "training" model for make_examples, we committed (4a11046) that fixed an issue introduced in an earlier commit (a4a6547) where make_examples might generate fewer REF (class0) examples than expected.
Improvements to accuracy for Illumina WGS models for various, shorter read lengths. Thanks to the following contributors and their teams for the idea:
- Dr. Masaru Koido (The University of Tokyo and RIKEN)
- Dr. Yoichiro Kamatani (The University of Tokyo and RIKEN)
- Mr. Kohei Tomizuka (RIKEN)
- Dr. Chikashi Terao (RIKEN)

Additional detail for improvements in DeepVariant v1.2:

Improvements for training:

We augmented the training data for Illumina WGS model by adding BAMs with trimmed reads (125bps and 100bps) to improve our model’s robustness on different read lengths.

Improvements for make_examples:
For more details on flags, run /opt/deepvariant/bin/make_examples --help for more details.

Major refactoring to ensure useful features (such as --add_hp_channel) can be shared between DeepVariant and DeepTrio make_examples.
Add MED_DP (median of DP) in the gVCF output. See this section for more details.
New --split_skip_reads flag: if True, make_examples will split reads with large SKIP cigar operations into individual reads. Resulting read parts that are less than 15 bp are filtered out.
We now sort the realigned BAM output mentioned in this section when you use --emit_realigned_reads=true --realigner_diagnostics=/output/realigned_reads for make_examples. You will still need to run samtools index to get the index file, but no longer need to sort the BAM.
Added an experimental prototype for multi-sample make_examples.
- This is an experimental prototype for working with multiple samples in DeepVariant, a proof of concept enabled by the refactoring to join together DeepVariant and DeepTrio, generalizing the functionality of make_examples to work with multiple samples. Usage information is in multisample_make_examples.py, but note that this is experimental.
Improved logic for read allele counts calculation for sites with low base quality indels, which resulted in Indel accuracy improvement for PacBio models.
Improvements to the realigner code to fix certain uncommon edge cases.

Improvements for the one-step run_deepvariant:
For more details on flags, run /opt/deepvariant/bin/run_deepvariant --help for more details.

New --runtime_report which enables runtime report output to --logging_dir. This makes it easier for users to get the runtime by region report for make_examples.
New --dry_run flag is now added for printing out all commands to be executed, without running them. This is mentioned in the Quick Start section.

Provide feedback