DeepVariant 1.2.0
The DeepVariant v1.2 release contains the following major improvements:
- A major code refactor for
make_examples
better modularizes common components between DeepVariant, DeepTrio, and potential future applications. This enables DeepTrio to inherit improvements such as--add_hp_channel
(introduced to the DeepVariant PacBio model in v1.1; see blog), improving DeepTrio’s PacBio accuracy. - The DeepVariant PacBio model has substantially improved accuracy for PacBio Sequel II Chemistry v2.2, achieved by including this data in the training dataset.
- We updated several dependencies: Python version to 3.8, TensorFlow version to 2.5.0, and GPU support version to CUDA 11.3 and cuDNN 8.2. The greater computational efficiency of these dependencies results in improvements to speed.
- In the "training" model for make_examples, we committed (4a11046) that fixed an issue introduced in an earlier commit (a4a6547) where make_examples might generate fewer REF (class0) examples than expected.
- Improvements to accuracy for Illumina WGS models for various, shorter read lengths. Thanks to the following contributors and their teams for the idea:
- Dr. Masaru Koido (The University of Tokyo and RIKEN)
- Dr. Yoichiro Kamatani (The University of Tokyo and RIKEN)
- Mr. Kohei Tomizuka (RIKEN)
- Dr. Chikashi Terao (RIKEN)
Additional detail for improvements in DeepVariant v1.2:
Improvements for training:
- We augmented the training data for Illumina WGS model by adding BAMs with trimmed reads (125bps and 100bps) to improve our model’s robustness on different read lengths.
Improvements for make_examples
:
For more details on flags, run /opt/deepvariant/bin/make_examples --help
for more details.
- Major refactoring to ensure useful features (such as --add_hp_channel) can be shared between DeepVariant and DeepTrio make_examples.
- Add MED_DP (median of DP) in the gVCF output. See this section for more details.
- New
--split_skip_reads
flag: if True, make_examples will split reads with large SKIP cigar operations into individual reads. Resulting read parts that are less than 15 bp are filtered out. - We now sort the realigned BAM output mentioned in this section when you use
--emit_realigned_reads=true --realigner_diagnostics=/output/realigned_reads
for make_examples. You will still need to runsamtools index
to get the index file, but no longer need to sort the BAM. - Added an experimental prototype for multi-sample make_examples.
- This is an experimental prototype for working with multiple samples in DeepVariant, a proof of concept enabled by the refactoring to join together DeepVariant and DeepTrio, generalizing the functionality of make_examples to work with multiple samples. Usage information is in multisample_make_examples.py, but note that this is experimental.
- Improved logic for read allele counts calculation for sites with low base quality indels, which resulted in Indel accuracy improvement for PacBio models.
- Improvements to the realigner code to fix certain uncommon edge cases.
Improvements for the one-step run_deepvariant
:
For more details on flags, run /opt/deepvariant/bin/run_deepvariant --help
for more details.
- New
--runtime_report
which enables runtime report output to--logging_dir
. This makes it easier for users to get the runtime by region report for make_examples. - New
--dry_run
flag is now added for printing out all commands to be executed, without running them. This is mentioned in the Quick Start section.