Skip to content

4.0.10.0

Compare
Choose a tag to compare
@droazen droazen released this 03 Oct 22:36
· 1154 commits to master since this release

Highlights of this release include a new tool ReblockGVCF, a bug fix for a crash in Mutect2, and a more efficient distribution mechanism for the reference and VCFs in Spark tools.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • Added a new experimental tool ReblockGVCF (#4940)

    • A tool to merge reference blocks in single-sample GVCFs for smaller filesizes
  • Mutect2:

    • Fixed a bug in the PalindromeArtifactClipReadTransformer (#5241)
      • This filter would crash with an out-of-bounds error for fragment lengths and/or mate start positions that went off the end of a contig.
    • Changed the way the log10AlleleFractions are calculated in SomaticLikelihoodsEngine: now we use the mean of the posterior of the allele fractions. (#5231)
    • Reword comments in Mutect2 WDL to not refer to the old orientation bias filter as deprecated. (#5196)
    • Cited CGA in Mutect docs (#5228)
  • HaplotypeCaller: Allow MNP calling in GVCF mode with stern warnings about not trying joint-genotyping from the resulting GVCFs. (#5182)

    • HaplotypeCaller will now allow you to output MNPs in GVCF mode with a warning, however since joint genotyping of MNPs is unsupported, CombineGVCFs and GenomicsDBImport will now refuse to process GVCFs containing MNPs.
  • GATK Spark tools:

    • Migrated most Spark tools that take a reference and/or VCF to use Spark's intrinsic file copying mechanism instead of broadcast to distribute the reference and VCFs to worker nodes (#5127) (#5221)
      • This improves the performance of Spark tools that take a reference and/or VCF as side inputs, as the new distribution mechanism doesn't load the entire contents of the files into memory like broadcast did.
      • As a side effect of this change, support for 2bit references has been removed from tools that were migrated to the new distribution mechanism (in particular, BaseRecalibratorSpark and HaplotypeCallerSpark).
      • The CNV Spark tools have not yet been migrated, and still support 2bit references for now.
    • Bug fix: ensure that intervals with no reads are not dropped by the SparkSharder (#5248)
  • Funcotator:

    • Added command line exclusion lists, so that users can prune fields from the output. (#5226)
    • Added Funcotator excluded fields option explicitly to the M2 WDLs. (#5242)
  • Fix a multithreaded race condition in GenotypeLikelihoodCalculators by synchronizing updates of shared genotype likelihood tables. (#5071)

    • This bug affected HaplotypeCallerSpark, but not the regular HaplotypeCaller
  • GenomicsDB: added in machinery to allow per-annotation combine operations to be specified (#4993)

  • GATK Engine: Hooked up CountingVariantFilter to VariantWalkers (#4954)

  • StreamingPythonScriptExecutor: added a new message to the StreamingProcessController ack FIFO protocol to allow additional message detail to be passed as part of a negative ack. (#5170)

    • This improves exception message propagation for fatal errors when running Python tools.
  • gCNV WDLs:

    • Tar calls from all samples. (#5225)
      • This fixes an issue where the gCNV WGS cohort germline WDL was outputting vcf files with names that do not correspond to the actual samples inside the files.
    • Added multi-sample functionality to gCNV case mode WDL, and added a wrapper for gCNV case mode WDL to help optimize cloud computation cost. Also optimized how data is sent to postprocessing task in gCNV WDLs. (#5176)
  • gCNV kernel: Enforced ViterbiSegmentationEngine to analyze single samples only (#5176)

  • Added a dataproc-cluster-ui script to easily open the Spark UI on dataproc clusters (#5188)

  • Fixed pom issues that prevented publishing to maven central (#5224)

  • Added tabix to the docker base image (#5247)