4.0.10.0
Highlights of this release include a new tool ReblockGVCF
, a bug fix for a crash in Mutect2
, and a more efficient distribution mechanism for the reference and VCFs in Spark tools.
As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/
Full list of changes in this release:
-
Added a new experimental tool
ReblockGVCF
(#4940)- A tool to merge reference blocks in single-sample GVCFs for smaller filesizes
-
Mutect2
:- Fixed a bug in the
PalindromeArtifactClipReadTransformer
(#5241)- This filter would crash with an out-of-bounds error for fragment lengths and/or mate start positions that went off the end of a contig.
- Changed the way the log10AlleleFractions are calculated in
SomaticLikelihoodsEngine
: now we use the mean of the posterior of the allele fractions. (#5231) - Reword comments in Mutect2 WDL to not refer to the old orientation bias filter as deprecated. (#5196)
- Cited CGA in Mutect docs (#5228)
- Fixed a bug in the
-
HaplotypeCaller
: Allow MNP calling in GVCF mode with stern warnings about not trying joint-genotyping from the resulting GVCFs. (#5182)HaplotypeCaller
will now allow you to output MNPs in GVCF mode with a warning, however since joint genotyping of MNPs is unsupported,CombineGVCFs
andGenomicsDBImport
will now refuse to process GVCFs containing MNPs.
-
GATK Spark tools
:- Migrated most Spark tools that take a reference and/or VCF to use Spark's intrinsic file copying mechanism instead of broadcast to distribute the reference and VCFs to worker nodes (#5127) (#5221)
- This improves the performance of Spark tools that take a reference and/or VCF as side inputs, as the new distribution mechanism doesn't load the entire contents of the files into memory like broadcast did.
- As a side effect of this change, support for 2bit references has been removed from tools that were migrated to the new distribution mechanism (in particular,
BaseRecalibratorSpark
andHaplotypeCallerSpark
). - The CNV Spark tools have not yet been migrated, and still support 2bit references for now.
- Bug fix: ensure that intervals with no reads are not dropped by the
SparkSharder
(#5248)
- Migrated most Spark tools that take a reference and/or VCF to use Spark's intrinsic file copying mechanism instead of broadcast to distribute the reference and VCFs to worker nodes (#5127) (#5221)
-
Funcotator
: -
Fix a multithreaded race condition in
GenotypeLikelihoodCalculators
by synchronizing updates of shared genotype likelihood tables. (#5071)- This bug affected
HaplotypeCallerSpark
, but not the regularHaplotypeCaller
- This bug affected
-
GenomicsDB
: added in machinery to allow per-annotation combine operations to be specified (#4993) -
GATK Engine
: Hooked upCountingVariantFilter
toVariantWalkers
(#4954) -
StreamingPythonScriptExecutor
: added a new message to theStreamingProcessController
ack FIFO protocol to allow additional message detail to be passed as part of a negative ack. (#5170)- This improves exception message propagation for fatal errors when running Python tools.
-
gCNV WDLs
:- Tar calls from all samples. (#5225)
- This fixes an issue where the gCNV WGS cohort germline WDL was outputting vcf files with names that do not correspond to the actual samples inside the files.
- Added multi-sample functionality to gCNV case mode WDL, and added a wrapper for gCNV case mode WDL to help optimize cloud computation cost. Also optimized how data is sent to postprocessing task in gCNV WDLs. (#5176)
- Tar calls from all samples. (#5225)
-
gCNV kernel
: Enforced ViterbiSegmentationEngine to analyze single samples only (#5176) -
Added a
dataproc-cluster-ui
script to easily open the Spark UI on dataproc clusters (#5188) -
Fixed pom issues that prevented publishing to maven central (#5224)
-
Added
tabix
to the docker base image (#5247)