Skip to content

Latest commit

 

History

History
482 lines (315 loc) · 15 KB

CHANGES.md

File metadata and controls

482 lines (315 loc) · 15 KB

CHANGES

5.7.0

  • FASTQ tag preservation during mapping (bam/cram handled unless "-legacy")
  • bwakit option added

5.6.1

  • htslib/samtools 1.12 - came out hours after we released 5.6.0

5.6.0

  • bwa-mem2 updated to v2.2.1 following successful large scale testing

5.5.1

  • bwa-mem2 increment.
  • bwa_mem2.pl correct arg checking to catch failure to provide input files.

5.5.0

  • Update to bwa-mem2 production release (v2.1)

5.4.2

  • use libdeflate /opt installation from cgpbigwig
  • performance now applies to samtools via this modification

5.4.1

  • Compile htslib with libdeflate (thanks to @mflevine)

5.4.0

  • Update base image of docker to cgpbigwig 1.5.0
  • Update to htslib 1.11
  • Update to samtools 1.11 - to handle long running markdup issue

5.3.0

  • Support for paired FASTQ names ending with _R1_001 and _R2_001 in addition to _1 and _2 as before.

5.2.2

  • Delete files that may remain after an aborted run before resume.

5.2.1

  • Correct biobambam2 version
  • Fix to merge/mark legacy method to use bammerge instead of samtools merge

5.2.0

  • Expose option for "legacy" to allow for <=5.0.5 processing methods.
    • bamtofastq when pulling reads from BAM/CRAM input.
    • bammarkduplicates2 for duplicate marking.
    • Affects bwa_mem.pl and merge_or_mark.pl

5.1.0

  • Base image updated to Focal (Ubuntu 20.04).
  • Majority of biobambam2 replaced with samtools functions.
  • Reads undergo full collate when mapping from BAM/CRAM (bwa-mem2 prep).
  • Duplicate marking samtools markdup --mode options exposed to bwa_mem.pl.
    • Lanes mapped with earlier versions of PCAP-core cannot be merged without reporocessing to add "mate score tag" via samtools fixmate.
  • Scramble option for bwa_mem.pl deprecated, relevant option for fast CRAM random access exposed.

5.0.5

  • Add noindex commandline flag to merge_or_mark.pl for bammerge calls. Only permitted alongisde qnamesort

5.0.4

  • Add qnamesort commandline flag to merge_or_mark.pl for bammerge calls (Defaults to SO=coordinate without)

5.0.3

  • Fix the setup.sh script.

5.0.2

  • bwa-mem2 only applied if you request it

5.0.1

  • Fix for pushing data for threaded item to stdout/err on failure

5.0.0

  • Adds basic merge_or_mark.pl script
    • for merging individual read-groups of data generated by bwa_mem.pl
  • I/O reduced in all marking/merging code working with BAM outputs
  • bwa_mem.pl now uses bwa-mem2 (v2.0pre2) for siginifcant perfomance boost
    • Mapping shows minor differences in *.bas file result.
    • Summary of impact on downstream WGS results are as follows:
      • ascatNgs
        • No difference in CN segments
        • Small floating point changesin samplestatistics.txt:
          • Ploidy: @6th s.f.
          • goodnessOfFit: @5th s.f.
      • cgpBattenberg
        • No difference in CN segments
        • Small floating point changes in rho_and_psi.txt:
          • ASCAT-ploidy: @8th s.f.
          • FRAC_GENOME-psi: @7th s.f.
          • FRAC_GENOME-ploidy: @8th s.f.
          • FRAC_GENOME-distance: @10th s.f.
          • REF_SEG-rho: @5th s.f.
          • REF_SEG-psi: @5th s.f.
          • REF_SEG-ploidy: @4th s.f.
          • REF_SEG-distance: @7th s.f.
      • BRASS
        • No changes in total BEDPE/VCF calls.
      • cgpPindel
        • 327 events passing filters (unchanged, minor changes to allele-frac)
        • 982097 shared unfiltered calls
        • +5 events failing filters
        • -3 events failing filters
      • cgpCaVEManWrapper+cgpCaVEManPostProcessing
        • 1040 events passing filters (unchanged, minor changes to allele-frac)
        • 106867 shared unfiltered calls
        • +22 events failing filters
        • -18 events failing filters

4.4.1

  • bamToBw.pl correct to send stdout/err to file when threads are active.

4.4.0

  • Adds local docker file to support large scale data tests on internal CI platform
  • CWL and additional helper scripts etc will continue to be supported through the dockstore-cgpmap project.

4.3.4

  • Further changes to threaded module to improve reliability under singularity

4.3.3

  • Remove explicit sync for several reasons
    • No longer chmod scripts
    • Causes problems with heavily loaded shared file systems (sync hangs)

4.3.2

  • Add options for input/output of CRAM in mismatchQc and mmFlagModifier
  • Correct mem free issue when writing CRAM in mismatchQc and mmFlagModifier
  • Resolved #40 - faster recovery times
  • Resolved #39 - handle splitting correctly

4.3.1

  • Added -p flag to program commandline output in BAM header @PG line

4.3.0

  • Removed duplicate flag from BAD_FLAGS variable in mismatchQC script
  • Added mmFlagModifier script to remove/reinstate QC_fail flag where mm:A:Y tag is found.

4.2.7

  • scripts generated by Threaded.pm now always sync'ed
  • remove need to chmod, reduce issues under docker, less file ops

4.2.6

  • New bam file added with mini reference for cram formatting test.
  • This prevents travis timeout issues where the reference area is being searched for by bamvalidate

4.2.5

  • mismatchQC has added commandline parameter - used by default
  • checks that reads marked as properly paired are in the correct orientations for paired-end (F/R)
  • if not then the properly paired flag is removed.

4.2.4

Fixed bam_stats outputs to be more consistent with pre 4.1.0.

The addition of the ability to count QC-Failed reads (previously completely excluded) had resulted in mapping stats being augmented by this class of reads instead of just total reads and the #qcfail_reads[12] fields.

4.2.3

  • Missing default value for bwa_mem.pl option -f|fragment fixed.
  • Documentation corrections.

4.2.2

  • Update Bio::DB::HTS to v2.10

4.2.1

  • Fix for input type of CRAM.

4.2.0

  • Add threadpool to bam_stats and diff_bams. Fixes #18, Fixes #19
  • bwa_mem.pl can use threads for bam_stats

4.1.4

  • Ensure sentinel calls exit with non-zero exit code. Fixes #16

4.1.3

  • Some issues with indexing slipped through

4.1.2

  • Clean up the biobambam2 version test
  • Bump biobambam2 version, primarily to include script for cleaning mmqc processed data
  • Add version option to reheadSQ
  • Change travis tests to output version so less verbose

4.1.1

  • Expose -a|overlap option in bamToBw.pl.

4.1.0

  • bam_stats now calculates number of qc_fail reads (flag 512/0x200).
  • mismatchQc script added to set flag 512 and aux tag 'mm:A:Y' when a read exceeds a mismatch threshold.
  • bwa_mem.pl has new options:
    • readgroup information can be loaded from a yaml file when fastq used as input.
    • Options to enable use of mismatchQc in processing
  • Docs point to dockstore wrapper and image.
  • biobambam2 now installed to a subfolder so need to extend PATH appropriately.

4.0.5

  • bam_stats now outputs a line of zeroes for bam files with zero reads

3.5.0

  • Adds RNA downloads to PanCancer download tool gnos_pull.pl
  • Hardening of external process handling in PCAP::Threaded
  • Adds C version of diff_bams

3.4.0

  • Significant speed up of BAM generation under bwa_mem.pl by using separate process to do compression of mark duplicate output and streaming BAS generation. Not possible to do this to CRAM in same way.

3.3.4

  • Reduce disk usage when running bwa_mem.pl
  • Improve throughput via slightly unintuitive use of additional pipes

3.3.0

  • adds map_threads|mt option to bwa_mem.pl to allow more control of parallel processing in one shot submission.
  • adds bwa_pl|l option to bwa_mem.pl to allow preload of different malloc libraries.

3.2.0

  • Move from legacy kent bigwig manipulation code and to cgpBigWig
    • Faster and handles the huge number of contigs in many new reference builds.
    • Resulting changes to underlying installed tools is bwcat now bwjoin to be more descriptive of actual function.

3.1.1

Handle recent changes to BioPerl structure

3.1.0

  • Use BWA default for -T previously hard coded to -T 0.
    • Can be passed through bwa_mem.pl other ags to bwa via the -b option.
  • Fix bam2bedgraph compilation since changes to underlying libraries
  • bamToBw.pl - expose read flag filters
  • Drop dependancy on Bio::DB::HTS INSTALL.pl as can't fix to known good version.
  • Added travis CI
  • Add support for output directly to CRAM
  • bwa version upgraded to 0.7.15

3.0.0

  • Threading module now converts currently running step to bash script for following reasons:
    • Changes logging to use file redirects instead of Capture::Tiny - prevent log bleed into wrong files
    • Commands for failed jobs remain after shutdown for easy debug/testing
  • Log and progress file names simplified so more portable.

2.4.1

  • Modified reheadSQ to be more robust.

2.1.0

  • Adds xam_coverage_bins.pl which calculates fraction of targets covered at various depths (BAM/CRAM), using BED/GFF3 as target bait file.

2.0.0

  • bwa_mem.pl
    • allow user to specify BWA mapping parameters
    • now accepts CRAM as input
  • bamToBw.pl - now accepts CRAM as input.
  • bam_stats - Adds 2 new stats:
    • #_mapped_pairs
    • #_inter_chr_pairs
  • Dependancy changes
    • WARNING: ensure all related tools handle these updates
    • samtools, now only uses htslib based versions (1.3+, handling deprecated use of sort)
    • Bio::DB::HTS htslib bindings replacing Bio::DB::Sam

1.14.0

  • bwa_mem.pl - Option to disable duplicate marking

1.13.0

  • bam_stats - Unit tests for C code
  • bam_stats - Fix to median insert size calculation

1.11.0

  • bam_stats - new rna switch to give more appropriate insert size stats
  • bam_stats - more robust handling of optional RG header entries
  • bam_stats - allows streaming IO (thanks to @jenniferliddle)
  • bwa_mem.pl - Handle ' in RG header line/IDs
  • Generally improved version handling and updated versions of some tools.

1.9.1

  • Changed final log folder to include sample name and analysis type, prevents clash when lots of data to same output loc.

1.9.0

  • Fix bugs #52 and #53
  • Modified bwa_mem.pl to accept multi-readgroup BAM as input

1.7.1

  • Turns out BWA mem still requires fixmates to get proper isize distributions
  • bumped biobambam to 0.0.191

1.7.0

  • Switched to bam_stats C in bwa_mem.pl.
  • Updates to bam_to_sra.pl to prevent bad SM values in unaligned BAM uploads.

1.6.0

  • Adding local file mode for sites that cannot download from GNOS when the xml_to_bas.pl script runs
  • gnos_pull.pl - see linked docs

1.5.4

  • bam_stats C
    • Reference file parameter is now optional to replicate bam_stats.pl functionality.
    • Warnings in help, and when a cram file is given as reference from header may not be found, and bam stats will fail.

1.5.3

  • bam_stats C - changed array for khash in insert size calculations in order to make code more robust.
  • Header RG line reading now reads anything not a tab or newline as it should when determining what the values of tags are.

1.5.2

  • bamToBw.pl fixes
    • Pull actual binaries from jkent_util not html page associated
    • process name corrections in bamToBw.pm command line args

1.5.1

  • bam_stats c now has CRAM support.
  • Also dropped need for samtools v1.x api as can be handled by htslib on it's own.

1.5.0

  • bamToBw.pl and new biobambam dep

No changes to old tools, just additions and prep for handling CRAM input.

1.4.0

bam_stats in C, less than 2 hours to generate stats on a sample level BAM file of ~120GB.

1.3.0

  • bam_stats.pl is now multi-threaded, can get ~50% runtime reduction with 3-4 threads, memory still <500MB.
  • Upgrades biobambam to 0.0.185 (and dependencies).

1.2.3

xml_to_bas.pl - detect readgroup id clashes and attempt to reconcile, #54

1.2.2

Fixed bug in bwa_mem.pl when using '-f' option on paired fastq.

1.2.1

Makes xml_to_bas.pl more robust on AWS. Retrieved XML was being truncated on some network configurations.

1.2.0

Modifications made to the bwa_mem.pl code to split a lane of data into fragments to reduce failure recovery time. Primarily added to handle X10 data better.

Also updated samtools to 0.1.20, last version that is currently compatible with Bio::DB::Sam.

1.1.1

Fix missing dependancy and build a relocatable version of biobambam suitable for use in artifactory.

1.1.0

  • Minor enhancement to bwa_mem.pl to automatically generate the *.bas file.
  • Added xml_to_bas.pl for pancancer users, see the wiki for details.
  • Fixed a few minor issues, #36, #37, #39

1.0.4

  • Install biobambam 0.0.142 to prevent over-counting of duplicates when multiple libraries, also required libmaus 0.0.124.
  • Improve install for those working with multiple perl installs.
  • Improve version inheritance, less code

1.0.3

  • Corrected issue from dynamic de-reference of hash, issue for pre 5.14 perl and potentially unstable in future.
  • Added missing project code to cv terms.
  • Bug-fixed upgrade path, still needs better solution.
  • Cleaned up messaging in Threaded module.

1.0.2

  • Upgrade install to pull biobambam 0.0.138
    • fastqtobam option 'pairedfile' for where readnames don't have trailing '/1' or '/2'.
    • fastqtobam option to relax qscore validation without turning off... careful
  • Upgrade install to pull BWA 0.7.8
    • performance improvements for short read alignment (100bp)

1.0.1

  • Upgrade install to pull biobambam 0.0.135
    • fastqtobam supports Casava v1.8
    • bamsort supports NM/MD correction during sam->bam/merge process
  • Minor enhancement to BAS reader module.
  • Sample name from command line passed through to SM of RG header in bwa_mem.pl
  • SRA.pm - check that rg id is unique within run of code (thanks to Junjun Zhang)
  • Threads.pm - join interval is now configurable.

1.0.0

  • bam_stats.pl actually installed now.
  • Basic *.bas perl access module.
  • Upgraded libmaus/biobambam to resolve patch and CentOS install issue.
  • Reference implementations ensure unique RG:ID between files.

0.3.0

  • Changes for the re-worked PanCancer submission SOP.
  • Patch for libmaus issue as not going to be a release in time.
  • Bug fix for *.info files (bam_to_sra_sub.pl).
  • Added bam_stats.pl.
  • Project is now defaulted when not provided (bam_to_sra_sub.pl).

0.2.0 14-Mar-2014

  • Updated biobambam version
  • Documented additional dependencies
  • Improved install implementation

0.0.2-beta2 04-Feb-2014

  • Updated module naming in preparation for publication to GitHub.
  • Added license boiler plate.
  • bam_to_sra_sub.pl generates valid XML for GNOS, some features disabled until modifications to GNOS can be made (warnings indicate this on execution)

0.0.2-beta 29-Jan-2014

  • Pre release with basic SRA XML generation (GENOS)
  • Updated requirements for biobambam of 0.0.120
  • Tests update to reflect change in biobambam requirement

0.0.1 11-Sep-2013

  • Initial codebase for PanCancer alignment with BWA 0.6.2.