Release 4.0.10.0 · broadinstitute/gatk

Highlights of this release include a new tool ReblockGVCF, a bug fix for a crash in Mutect2, and a more efficient distribution mechanism for the reference and VCFs in Spark tools.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

Added a new experimental tool ReblockGVCF (#4940)
- A tool to merge reference blocks in single-sample GVCFs for smaller filesizes
Mutect2:
- Fixed a bug in the PalindromeArtifactClipReadTransformer (#5241)
  - This filter would crash with an out-of-bounds error for fragment lengths and/or mate start positions that went off the end of a contig.
- Changed the way the log10AlleleFractions are calculated in SomaticLikelihoodsEngine: now we use the mean of the posterior of the allele fractions. (#5231)
- Reword comments in Mutect2 WDL to not refer to the old orientation bias filter as deprecated. (#5196)
- Cited CGA in Mutect docs (#5228)
HaplotypeCaller: Allow MNP calling in GVCF mode with stern warnings about not trying joint-genotyping from the resulting GVCFs. (#5182)
- HaplotypeCaller will now allow you to output MNPs in GVCF mode with a warning, however since joint genotyping of MNPs is unsupported, CombineGVCFs and GenomicsDBImport will now refuse to process GVCFs containing MNPs.
GATK Spark tools:
- Migrated most Spark tools that take a reference and/or VCF to use Spark's intrinsic file copying mechanism instead of broadcast to distribute the reference and VCFs to worker nodes (#5127) (#5221)
  - This improves the performance of Spark tools that take a reference and/or VCF as side inputs, as the new distribution mechanism doesn't load the entire contents of the files into memory like broadcast did.
  - As a side effect of this change, support for 2bit references has been removed from tools that were migrated to the new distribution mechanism (in particular, BaseRecalibratorSpark and HaplotypeCallerSpark).
  - The CNV Spark tools have not yet been migrated, and still support 2bit references for now.
- Bug fix: ensure that intervals with no reads are not dropped by the SparkSharder (#5248)
Funcotator:
- Added command line exclusion lists, so that users can prune fields from the output. (#5226)
- Added Funcotator excluded fields option explicitly to the M2 WDLs. (#5242)
Fix a multithreaded race condition in GenotypeLikelihoodCalculators by synchronizing updates of shared genotype likelihood tables. (#5071)
- This bug affected HaplotypeCallerSpark, but not the regular HaplotypeCaller
GenomicsDB: added in machinery to allow per-annotation combine operations to be specified (#4993)
GATK Engine: Hooked up CountingVariantFilter to VariantWalkers (#4954)
StreamingPythonScriptExecutor: added a new message to the StreamingProcessController ack FIFO protocol to allow additional message detail to be passed as part of a negative ack. (#5170)
- This improves exception message propagation for fatal errors when running Python tools.
gCNV WDLs:
- Tar calls from all samples. (#5225)
  - This fixes an issue where the gCNV WGS cohort germline WDL was outputting vcf files with names that do not correspond to the actual samples inside the files.
- Added multi-sample functionality to gCNV case mode WDL, and added a wrapper for gCNV case mode WDL to help optimize cloud computation cost. Also optimized how data is sent to postprocessing task in gCNV WDLs. (#5176)
gCNV kernel: Enforced ViterbiSegmentationEngine to analyze single samples only (#5176)
Added a dataproc-cluster-ui script to easily open the Spark UI on dataproc clusters (#5188)
Fixed pom issues that prevented publishing to maven central (#5224)
Added tabix to the docker base image (#5247)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0.10.0