- FASTQ tag preservation during mapping (bam/cram handled unless "-legacy")
- bwakit option added
- htslib/samtools 1.12 - came out hours after we released 5.6.0
bwa-mem2
updated to v2.2.1 following successful large scale testing
bwa-mem2
increment.bwa_mem2.pl
correct arg checking to catch failure to provide input files.
- Update to bwa-mem2 production release (v2.1)
- use libdeflate /opt installation from cgpbigwig
- performance now applies to samtools via this modification
- Compile htslib with libdeflate (thanks to @mflevine)
- Update base image of docker to cgpbigwig 1.5.0
- Update to htslib 1.11
- Update to samtools 1.11 - to handle long running markdup issue
- Support for paired FASTQ names ending with
_R1_001
and_R2_001
in addition to_1
and_2
as before.
- Delete files that may remain after an aborted run before resume.
- Correct biobambam2 version
- Fix to merge/mark legacy method to use bammerge instead of samtools merge
- Expose option for "legacy" to allow for <=5.0.5 processing methods.
bamtofastq
when pulling reads from BAM/CRAM input.bammarkduplicates2
for duplicate marking.- Affects
bwa_mem.pl
andmerge_or_mark.pl
- Base image updated to Focal (Ubuntu 20.04).
- Majority of biobambam2 replaced with samtools functions.
- Reads undergo full collate when mapping from BAM/CRAM (bwa-mem2 prep).
- Duplicate marking
samtools markdup --mode
options exposed tobwa_mem.pl
.- Lanes mapped with earlier versions of PCAP-core cannot be merged without reporocessing to add "mate score tag" via
samtools fixmate
.
- Lanes mapped with earlier versions of PCAP-core cannot be merged without reporocessing to add "mate score tag" via
- Scramble option for
bwa_mem.pl
deprecated, relevant option for fast CRAM random access exposed.
- Add
noindex
commandline flag tomerge_or_mark.pl
for bammerge calls. Only permitted alongisdeqnamesort
- Add
qnamesort
commandline flag tomerge_or_mark.pl
for bammerge calls (Defaults toSO=coordinate
without)
- Fix the
setup.sh
script.
- bwa-mem2 only applied if you request it
- Fix for pushing data for threaded item to stdout/err on failure
- Adds basic
merge_or_mark.pl
script- for merging individual read-groups of data generated by
bwa_mem.pl
- for merging individual read-groups of data generated by
- I/O reduced in all marking/merging code working with BAM outputs
bwa_mem.pl
now usesbwa-mem2 (v2.0pre2)
for siginifcant perfomance boost- Mapping shows minor differences in
*.bas
file result. - Summary of impact on downstream WGS results are as follows:
- ascatNgs
- No difference in CN segments
- Small floating point changesin
samplestatistics.txt
:- Ploidy: @6th s.f.
- goodnessOfFit: @5th s.f.
- cgpBattenberg
- No difference in CN segments
- Small floating point changes in
rho_and_psi.txt
:- ASCAT-ploidy: @8th s.f.
- FRAC_GENOME-psi: @7th s.f.
- FRAC_GENOME-ploidy: @8th s.f.
- FRAC_GENOME-distance: @10th s.f.
- REF_SEG-rho: @5th s.f.
- REF_SEG-psi: @5th s.f.
- REF_SEG-ploidy: @4th s.f.
- REF_SEG-distance: @7th s.f.
- BRASS
- No changes in total BEDPE/VCF calls.
- cgpPindel
- 327 events passing filters (unchanged, minor changes to allele-frac)
- 982097 shared unfiltered calls
- +5 events failing filters
- -3 events failing filters
- cgpCaVEManWrapper+cgpCaVEManPostProcessing
- 1040 events passing filters (unchanged, minor changes to allele-frac)
- 106867 shared unfiltered calls
- +22 events failing filters
- -18 events failing filters
- ascatNgs
- Mapping shows minor differences in
bamToBw.pl
correct to send stdout/err to file when threads are active.
- Adds local docker file to support large scale data tests on internal CI platform
- CWL and additional helper scripts etc will continue to be supported through the dockstore-cgpmap project.
- Further changes to threaded module to improve reliability under singularity
- Remove explicit
sync
for several reasons- No longer
chmod
scripts - Causes problems with heavily loaded shared file systems (sync hangs)
- No longer
- Add options for input/output of CRAM in mismatchQc and mmFlagModifier
- Correct mem free issue when writing CRAM in mismatchQc and mmFlagModifier
- Resolved #40 - faster recovery times
- Resolved #39 - handle splitting correctly
- Added -p flag to program commandline output in BAM header @PG line
- Removed duplicate flag from
BAD_FLAGS
variable in mismatchQC script - Added
mmFlagModifier
script to remove/reinstate QC_fail flag wheremm:A:Y
tag is found.
- scripts generated by Threaded.pm now always sync'ed
- remove need to chmod, reduce issues under docker, less file ops
- New bam file added with mini reference for cram formatting test.
- This prevents travis timeout issues where the reference area is being searched for by bamvalidate
- mismatchQC has added commandline parameter - used by default
- checks that reads marked as properly paired are in the correct orientations for paired-end (F/R)
- if not then the properly paired flag is removed.
Fixed bam_stats outputs to be more consistent with pre 4.1.0.
The addition of the ability to count QC-Failed reads (previously completely excluded)
had resulted in mapping stats being augmented by this class of reads instead of just total reads
and the #qcfail_reads[12]
fields.
- Missing default value for
bwa_mem.pl
option-f|fragment
fixed. - Documentation corrections.
- Update
Bio::DB::HTS
to v2.10
- Fix for input type of CRAM.
- Add threadpool to
bam_stats
anddiff_bams
. Fixes #18, Fixes #19 - bwa_mem.pl can use threads for
bam_stats
- Ensure
sentinel
calls exit with non-zero exit code. Fixes #16
- Some issues with indexing slipped through
- Clean up the biobambam2 version test
- Bump biobambam2 version, primarily to include script for cleaning mmqc processed data
- Add version option to reheadSQ
- Change travis tests to output version so less verbose
- Expose
-a|overlap
option inbamToBw.pl
.
bam_stats
now calculates number ofqc_fail
reads (flag 512/0x200).mismatchQc
script added to set flag 512 and aux tag 'mm:A:Y' when a read exceeds a mismatch threshold.bwa_mem.pl
has new options:- readgroup information can be loaded from a
yaml
file when fastq used as input. - Options to enable use of
mismatchQc
in processing
- readgroup information can be loaded from a
- Docs point to dockstore wrapper and image.
biobambam2
now installed to a subfolder so need to extendPATH
appropriately.
- bam_stats now outputs a line of zeroes for bam files with zero reads
- Adds RNA downloads to PanCancer download tool
gnos_pull.pl
- Hardening of external process handling in
PCAP::Threaded
- Adds C version of
diff_bams
- Significant speed up of BAM generation under
bwa_mem.pl
by using separate process to do compression of mark duplicate output and streaming BAS generation. Not possible to do this to CRAM in same way.
- Reduce disk usage when running
bwa_mem.pl
- Improve throughput via slightly unintuitive use of additional pipes
- adds
map_threads|mt
option tobwa_mem.pl
to allow more control of parallel processing in one shot submission. - adds
bwa_pl|l
option tobwa_mem.pl
to allow preload of different malloc libraries.
- Move from legacy kent bigwig manipulation code and to
cgpBigWig
- Faster and handles the huge number of contigs in many new reference builds.
- Resulting changes to underlying installed tools is
bwcat
nowbwjoin
to be more descriptive of actual function.
Handle recent changes to BioPerl structure
- Use BWA default for
-T
previously hard coded to-T 0
.- Can be passed through
bwa_mem.pl
other ags to bwa via the-b
option.
- Can be passed through
- Fix
bam2bedgraph
compilation since changes to underlying libraries bamToBw.pl
- expose read flag filters- Drop dependancy on
Bio::DB::HTS INSTALL.pl
as can't fix to known good version. - Added travis CI
- Add support for output directly to CRAM
bwa
version upgraded to 0.7.15
- Threading module now converts currently running step to bash script for following reasons:
- Changes logging to use file redirects instead of Capture::Tiny - prevent log bleed into wrong files
- Commands for failed jobs remain after shutdown for easy debug/testing
- Log and progress file names simplified so more portable.
- Modified reheadSQ to be more robust.
- Adds
xam_coverage_bins.pl
which calculates fraction of targets covered at various depths (BAM/CRAM), using BED/GFF3 as target bait file.
- bwa_mem.pl
- allow user to specify BWA mapping parameters
- now accepts CRAM as input
- bamToBw.pl - now accepts CRAM as input.
- bam_stats - Adds 2 new stats:
#_mapped_pairs
#_inter_chr_pairs
- Dependancy changes
- WARNING: ensure all related tools handle these updates
- samtools, now only uses htslib based versions (1.3+, handling deprecated use of sort)
- Bio::DB::HTS htslib bindings replacing Bio::DB::Sam
- bwa_mem.pl - Option to disable duplicate marking
- bam_stats - Unit tests for C code
- bam_stats - Fix to median insert size calculation
- bam_stats - new rna switch to give more appropriate insert size stats
- bam_stats - more robust handling of optional RG header entries
- bam_stats - allows streaming IO (thanks to @jenniferliddle)
- bwa_mem.pl - Handle
'
in RG header line/IDs - Generally improved version handling and updated versions of some tools.
- Changed final log folder to include sample name and analysis type, prevents clash when lots of data to same output loc.
- Fix bugs #52 and #53
- Modified bwa_mem.pl to accept multi-readgroup BAM as input
- Turns out BWA mem still requires fixmates to get proper isize distributions
- bumped biobambam to 0.0.191
- Switched to bam_stats C in
bwa_mem.pl
. - Updates to
bam_to_sra.pl
to prevent bad SM values in unaligned BAM uploads.
- Adding local file mode for sites that cannot download from GNOS when the xml_to_bas.pl script runs
- gnos_pull.pl - see linked docs
- bam_stats C
- Reference file parameter is now optional to replicate bam_stats.pl functionality.
- Warnings in help, and when a cram file is given as reference from header may not be found, and bam stats will fail.
- bam_stats C - changed array for khash in insert size calculations in order to make code more robust.
- Header RG line reading now reads anything not a tab or newline as it should when determining what the values of tags are.
- bamToBw.pl fixes
- Pull actual binaries from jkent_util not html page associated
- process name corrections in bamToBw.pm command line args
- bam_stats c now has CRAM support.
- Also dropped need for samtools v1.x api as can be handled by htslib on it's own.
- bamToBw.pl and new biobambam dep
No changes to old tools, just additions and prep for handling CRAM input.
bam_stats in C, less than 2 hours to generate stats on a sample level BAM file of ~120GB.
- bam_stats.pl is now multi-threaded, can get ~50% runtime reduction with 3-4 threads, memory still <500MB.
- Upgrades biobambam to 0.0.185 (and dependencies).
xml_to_bas.pl - detect readgroup id clashes and attempt to reconcile, #54
Fixed bug in bwa_mem.pl when using '-f' option on paired fastq.
Makes xml_to_bas.pl more robust on AWS. Retrieved XML was being truncated on some network configurations.
Modifications made to the bwa_mem.pl code to split a lane of data into fragments to reduce failure recovery time. Primarily added to handle X10 data better.
Also updated samtools to 0.1.20, last version that is currently compatible with Bio::DB::Sam.
Fix missing dependancy and build a relocatable version of biobambam suitable for use in artifactory.
- Minor enhancement to bwa_mem.pl to automatically generate the
*.bas
file. - Added xml_to_bas.pl for pancancer users, see the wiki for details.
- Fixed a few minor issues, #36, #37, #39
- Install biobambam 0.0.142 to prevent over-counting of duplicates when multiple libraries, also required libmaus 0.0.124.
- Improve install for those working with multiple perl installs.
- Improve version inheritance, less code
- Corrected issue from dynamic de-reference of hash, issue for pre 5.14 perl and potentially unstable in future.
- Added missing project code to cv terms.
- Bug-fixed upgrade path, still needs better solution.
- Cleaned up messaging in Threaded module.
- Upgrade install to pull biobambam 0.0.138
- fastqtobam option 'pairedfile' for where readnames don't have trailing '/1' or '/2'.
- fastqtobam option to relax qscore validation without turning off... careful
- Upgrade install to pull BWA 0.7.8
- performance improvements for short read alignment (100bp)
- Upgrade install to pull biobambam 0.0.135
- fastqtobam supports Casava v1.8
- bamsort supports NM/MD correction during sam->bam/merge process
- Minor enhancement to BAS reader module.
- Sample name from command line passed through to SM of RG header in bwa_mem.pl
- SRA.pm - check that rg id is unique within run of code (thanks to Junjun Zhang)
- Threads.pm - join interval is now configurable.
- bam_stats.pl actually installed now.
- Basic
*.bas
perl access module. - Upgraded libmaus/biobambam to resolve patch and CentOS install issue.
- Reference implementations ensure unique RG:ID between files.
- Changes for the re-worked PanCancer submission SOP.
- Patch for libmaus issue as not going to be a release in time.
- Bug fix for
*.info
files (bam_to_sra_sub.pl). - Added bam_stats.pl.
- Project is now defaulted when not provided (bam_to_sra_sub.pl).
- Updated biobambam version
- Documented additional dependencies
- Improved install implementation
- Updated module naming in preparation for publication to GitHub.
- Added license boiler plate.
- bam_to_sra_sub.pl generates valid XML for GNOS, some features disabled until modifications to GNOS can be made (warnings indicate this on execution)
- Pre release with basic SRA XML generation (GENOS)
- Updated requirements for biobambam of 0.0.120
- Tests update to reflect change in biobambam requirement
- Initial codebase for PanCancer alignment with BWA 0.6.2.