Skip to content

DeepVariant 1.2.0

Compare
Choose a tag to compare
@pichuan pichuan released this 29 Jul 16:59
· 9 commits to r1.2 since this release

The DeepVariant v1.2 release contains the following major improvements:

  • A major code refactor for make_examples better modularizes common components between DeepVariant, DeepTrio, and potential future applications. This enables DeepTrio to inherit improvements such as --add_hp_channel (introduced to the DeepVariant PacBio model in v1.1; see blog), improving DeepTrio’s PacBio accuracy.
  • The DeepVariant PacBio model has substantially improved accuracy for PacBio Sequel II Chemistry v2.2, achieved by including this data in the training dataset.
  • We updated several dependencies: Python version to 3.8, TensorFlow version to 2.5.0, and GPU support version to CUDA 11.3 and cuDNN 8.2. The greater computational efficiency of these dependencies results in improvements to speed.
  • In the "training" model for make_examples, we committed (4a11046) that fixed an issue introduced in an earlier commit (a4a6547) where make_examples might generate fewer REF (class0) examples than expected.
  • Improvements to accuracy for Illumina WGS models for various, shorter read lengths. Thanks to the following contributors and their teams for the idea:
    • Dr. Masaru Koido (The University of Tokyo and RIKEN)
    • Dr. Yoichiro Kamatani (The University of Tokyo and RIKEN)
    • Mr. Kohei Tomizuka (RIKEN)
    • Dr. Chikashi Terao (RIKEN)

Additional detail for improvements in DeepVariant v1.2:

Improvements for training:

  • We augmented the training data for Illumina WGS model by adding BAMs with trimmed reads (125bps and 100bps) to improve our model’s robustness on different read lengths.

Improvements for make_examples:
For more details on flags, run /opt/deepvariant/bin/make_examples --help for more details.

  • Major refactoring to ensure useful features (such as --add_hp_channel) can be shared between DeepVariant and DeepTrio make_examples.
  • Add MED_DP (median of DP) in the gVCF output. See this section for more details.
  • New --split_skip_reads flag: if True, make_examples will split reads with large SKIP cigar operations into individual reads. Resulting read parts that are less than 15 bp are filtered out.
  • We now sort the realigned BAM output mentioned in this section when you use --emit_realigned_reads=true --realigner_diagnostics=/output/realigned_reads for make_examples. You will still need to run samtools index to get the index file, but no longer need to sort the BAM.
  • Added an experimental prototype for multi-sample make_examples.
    • This is an experimental prototype for working with multiple samples in DeepVariant, a proof of concept enabled by the refactoring to join together DeepVariant and DeepTrio, generalizing the functionality of make_examples to work with multiple samples. Usage information is in multisample_make_examples.py, but note that this is experimental.
  • Improved logic for read allele counts calculation for sites with low base quality indels, which resulted in Indel accuracy improvement for PacBio models.
  • Improvements to the realigner code to fix certain uncommon edge cases.

Improvements for the one-step run_deepvariant:
For more details on flags, run /opt/deepvariant/bin/run_deepvariant --help for more details.

  • New --runtime_report which enables runtime report output to --logging_dir. This makes it easier for users to get the runtime by region report for make_examples.
  • New --dry_run flag is now added for printing out all commands to be executed, without running them. This is mentioned in the Quick Start section.