Skip to content

Commit

Permalink
Merge pull request #629 from FelixKrueger/dev
Browse files Browse the repository at this point in the history
merging dev for new release
  • Loading branch information
FelixKrueger authored Sep 27, 2023
2 parents 7288cb6 + e3fec9d commit acf965c
Show file tree
Hide file tree
Showing 15 changed files with 253 additions and 24 deletions.
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Bismark Changelog

## Changelog for Bismark v0.24.2 (release on 27 Sep 2023)

### Bismark

- removed an `exit 0` that would terminate runs after processing a single (set of) input file(s).

### deduplicate_bismark

- Changed the path to Samtools to custom variable ([#609](https://github.com/FelixKrueger/Bismark/issues/609))

### coverage2cytosine

- set threshold reads to 1 (if it was 0) for `--gc_context` as intended and mentioned in the help text. Fixes [#621](https://github.com/FelixKrueger/Bismark/issues/621)


Added scripts for merging coverage files (e.g. for when R1 and R2 had been run in single-end mode)


## Changelog for Bismark v0.24.1 (release on 29 May 2023)

- Added new [documentation website](http://felixkrueger.github.io/Bismark/), built using [Material for Mkdocs](https://squidfunk.github.io/mkdocs-material/). Thanks to @ewels for a great (late-night) effort to break up and restructure what had become a fairly unwieldy monolithic beast
Expand Down
2 changes: 1 addition & 1 deletion NOMe_filtering
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use Carp;

my %chromosomes; # storing sequence information of all chromosomes/scaffolds
my %processed; # keeping a record of which chromosomes have been processed
my $nome_version = 'v0.24.1';
my $nome_version = 'v0.24.2';

my ($output_dir,$genome_folder,$zero,$CpG_only,$CX_context,$split_by_chromosome,$parent_dir,$coverage_infile,$cytosine_out,$merge_CpGs,$gc_context,$gzip,$nome) = process_commandline();

Expand Down
2 changes: 1 addition & 1 deletion bam2nuc
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ my %freqs; # keeping a record of which chromosomes have been processed
my %genomic_freqs;
my %processed;

my $bam2nuc_version = 'v0.24.1';
my $bam2nuc_version = 'v0.24.2';

my ($output_dir,$genome_folder,$parent_dir,$samtools_path,$genome_freq_only) = process_commandline();

Expand Down
12 changes: 6 additions & 6 deletions bismark
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ use lib "$RealBin/../lib";

my $parent_dir = getcwd();

my $bismark_version = 'v0.24.1';
my $bismark_version = 'v0.24.2';
my $copyright_dates = "2010-23";

my $start_run = time();
Expand Down Expand Up @@ -921,7 +921,7 @@ foreach my $filename (@filenames){
warn "Nucleotide coverage statistics are currently only available for BAM or CRAM files\n\n";
}
}
exit 0;
# exit 0; # will terminate after a single supplied file....
}
else{
# If multiple files were supplied as the command line, like so:
Expand Down Expand Up @@ -7963,9 +7963,9 @@ VERSION

### BOWTIE 2/HISAT2 PARALLELIZATION OPTIONS
if ($parallel){
die "Please select a value for -p of 2 or more (ideally not higher than 4 though...)!\n" unless ($parallel > 1);
die "Please select a value for -p of 2 or more!\n" unless ($parallel > 1);
if ($parallel > 4){
warn "Attention: using more than 4 cores per alignment thread has been reported to have diminishing returns. If possible try to limit -p to a value of 4\n"; sleep(2);
warn "Attention: early reports suggested that high values of -p to have diminishing returns. Please test different values using a small subset of data for your hardware setting.\n"; sleep(1);
}
push @aligner_options,"-p $parallel";
push @aligner_options,'--reorder'; ## re-orders the Bowtie 2 or HISAT2 output so that it does match the input files. This is abolutely required for parallelization to work.
Expand Down Expand Up @@ -9916,7 +9916,7 @@ Bowtie 2/ HISAT2 parallelization options:
and synchronize when parsing reads and outputting alignments. Searching for alignments is highly
parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.
E.g. when aligning to a human genome index, increasing -p from 1 to 8 increases the memory footprint
by a few hundred megabytes. This option is only available if bowtie is linked with the pthreads
by a few hundred megabytes. This option is only available if Bowtie 2 is linked with the pthreads
library (i.e. if BOWTIE_PTHREADS=0 is not specified at build time). In addition, this option will
automatically use the option '--reorder', which guarantees that output SAM records are printed in
an order corresponding to the order of the reads in the original input file, even when -p is set
Expand Down Expand Up @@ -9994,6 +9994,6 @@ Bismark BAM/SAM OUTPUT (default):
Each read of paired-end alignments is written out in a separate line in the above format.
Last modified on 28 May 2023
Last modified on 23 August 2023
HOW_TO
}
2 changes: 1 addition & 1 deletion bismark2bedGraph
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ use Carp;
## You should have received a copy of the GNU General Public License
## along with this program. If not, see <http://www.gnu.org/licenses/>.

my $bismark2bedGraph_version = 'v0.24.1';
my $bismark2bedGraph_version = 'v0.24.2';

my @bedfiles;
my @methylcalls = qw (0 0 0); # [0] = methylated, [1] = unmethylated, [2] = total
Expand Down
2 changes: 1 addition & 1 deletion bismark2report
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ use lib "$RealBin/../lib";
## You should have received a copy of the GNU General Public License
## along with this program. If not, see <http://www.gnu.org/licenses/>.

my $bismark2report_version = 'v0.24.1';
my $bismark2report_version = 'v0.24.2';
my (@alignment_reports,@dedup_reports,@splitting_reports,@mbias_reports,@nuc_reports);

my ($output_dir,$verbose,$manual_output_file) = process_commandline();
Expand Down
2 changes: 1 addition & 1 deletion bismark2summary
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ use lib "$RealBin/../lib";

## You should have received a copy of the GNU General Public License
## along with this program. If not, see <http://www.gnu.org/licenses/>.
my $bismark_version = '0.24.1';
my $bismark_version = '0.24.2';

# Last modified 09 11 2020

Expand Down
2 changes: 1 addition & 1 deletion bismark_genome_preparation
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ my %genomic_freqs; # storing the genomic sequence composition
my %freqs;


my $bismark_version = 'v0.24.1';
my $bismark_version = 'v0.24.2';
my $copyright_date = '2010-23';
my $last_modified = "19 May 2022";

Expand Down
2 changes: 1 addition & 1 deletion bismark_methylation_extractor
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ my $parent_dir = getcwd();

my %fhs;

my $version = 'v0.24.1';
my $version = 'v0.24.2';
my ($ignore,$genomic_fasta,$single,$paired,$full,$report,$no_overlap,$merge_non_CpG,$output_dir,$no_header,$bedGraph,$remove,$coverage_threshold,$counts,$cytosine_report,$genome_folder,$zero,$CpG_only,$CX_context,$split_by_chromosome,$sort_size,$samtools_path,$gzip,$ignore_r2,$mbias_off,$mbias_only,$gazillion,$ample_mem,$ignore_3prime,$ignore_3prime_r2,$multicore,$yacht,$ucsc) = process_commandline();

### only needed for bedGraph output
Expand Down
22 changes: 15 additions & 7 deletions coverage2cytosine
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use Carp;
my %chromosomes; # storing sequence information of all chromosomes/scaffolds
my %processed; # keeping a record of which chromosomes have been processed
my %context_summary; # storing methylation values for all contexts for NOMe-seq or scNMT-experiments
my $coverage2cytosine_version = 'v0.24.1';
my $coverage2cytosine_version = 'v0.24.2';

my ($output_dir,$genome_folder,$zero,$CpG_only,$CX_context,$split_by_chromosome,$parent_dir,$coverage_infile,$cytosine_out,$merge_CpGs,$gc_context,$gzip,$tetra,$nome,$disco,$threshold,$drach) = process_commandline();

Expand Down Expand Up @@ -708,6 +708,12 @@ sub generate_GC_context_report {
warn "="x82,"\n";
warn "Methylation information for GC context will now be written to a GpC-context report\n";
warn "="x82,"\n\n";

warn "For the GC report, positions need to have coverage of at least 1 call. The threshold currently is: $threshold\n";
if ($threshold == 0){
warn "Setting threshold to 1\n";
$threshold = 1;
}
sleep (2);

my $number_processed = 0;
Expand Down Expand Up @@ -981,6 +987,8 @@ sub generate_GC_context_report {

# if Cs were not covered at all they are not written out
if ($meth_bottom + $nonmeth_bottom >= $threshold){

# warn ("methylation bottom strand: $meth_bottom\nnon-methylation bottom strand: $nonmeth_bottom\nThreshold: $threshold\n\n"); sleep (1);
my $percentage = sprintf ("%.6f",$meth_bottom/ ($meth_bottom + $nonmeth_bottom) *100);
if ($nome and ($context_bottom eq 'CG') ){
# not reporting these (see point 2. above)
Expand Down Expand Up @@ -1965,12 +1973,12 @@ sub process_commandline{
print << "VERSION";
Bismark Methylation Extractor Module -
coverage2cytosine
Bismark Methylation Extractor Module -
coverage2cytosine
Bismark coverage2cytosine Version: $coverage2cytosine_version
Copyright 2010-22 Felix Krueger, Altos Bioinformatics
https://github.com/FelixKrueger/Bismark
Version: $coverage2cytosine_version
Copyright 2010-23 Felix Krueger, Altos Bioinformatics
https://github.com/FelixKrueger/Bismark
VERSION
Expand Down Expand Up @@ -2231,7 +2239,7 @@ The genome-wide cytosine methylation output file is tab-delimited in the followi
<chromosome> <position> <strand> <count methylated> <count non-methylated> <C-context> <trinucleotide context>
Script last modified: 06 Oct 2020
Script last modified: 09 Sep 2023
EOF
;
Expand Down
4 changes: 2 additions & 2 deletions deduplicate_bismark
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use Cwd;
### This script is supposed to remove alignments to the same position in the genome which can arise by e.g. PCR amplification
### Paired-end alignments are considered a duplicate if both partner reads start and end at the exact same position

my $dedup_version = 'v0.24.1';
my $dedup_version = 'v0.24.2';
my @filenames;

my ($single,$paired,$global_single,$global_paired,$samtools_path,$bam,$rrbs,$multiple,$output_dir,$outfile,$parallel) = process_commandline();
Expand Down Expand Up @@ -154,7 +154,7 @@ foreach my $file (@filenames){
if ($multiple){
# we are assuming that all files are either in BAM format
if ($file =~ /\.bam$/){
open (IN,"$samtools_path cat -h $file @filenames | samtools view -h |") or die "Unable to read from BAM files in '@filenames': $!\n";
open (IN,"$samtools_path cat -h $file @filenames | $samtools_path view -h |") or die "Unable to read from BAM files in '@filenames': $!\n";
}
# or all in SAM format
else{ # if the reads are in SAM format we simply concatenate them
Expand Down
2 changes: 1 addition & 1 deletion filter_non_conversion
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ $|++;
## along with this program. If not, see <http://www.gnu.org/licenses/>.

my $parent_dir = getcwd();
my $filter_version = 'v0.24.1';
my $filter_version = 'v0.24.2';

my ($global_single,$global_paired,$samtools_path,$threshold,$consecutive,$percentage_cutoff,$minimum_count) = process_commandline();

Expand Down
78 changes: 78 additions & 0 deletions merge_arbitrary_coverage_files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env python

import os
import glob
import time
import gzip
import argparse
import sys

def merge_coverage_files(basename):
print(f"Merging Bismark coverage files into a file called '{basename}.cov.gz'")
cov_files = glob.glob("*.cov.gz")

if not cov_files:
print("Error: No files ending in '.cov.gz' found in the current folder.")
sys.exit(1)

allcov = {} # overarching dictionary

for file in cov_files:
print(f"Reading methylation calls from file: {file}")

isGzip = False
if file.endswith("gz"):
infile = gzip.open(file, 'rt') # mode is 'rt' for text mode
isGzip = True
else:
infile = open(file)

for line in infile:

if isGzip:
line = line.rstrip() # no need to decode if using 'rt' mode
else:
line = line.rstrip()

chrom, pos, m, u = [line.split(sep="\t")[i] for i in (0, 1, 4, 5)] # list comprehension

if chrom in allcov.keys():
pass
else:
allcov[chrom] = {}

pos = int(pos)

if pos in allcov[chrom].keys():
pass
else:
allcov[chrom][pos] = {}
allcov[chrom][pos]["meth"] = 0
allcov[chrom][pos]["unmeth"] = 0

allcov[chrom][pos]["meth"] += int(m)
allcov[chrom][pos]["unmeth"] += int(u)

infile.close()

print("Now printing out a new, merged coverage file")

with gzip.open(f"{basename}.cov.gz", "wt") as out:
for chrom in sorted(allcov.keys()):
for pos in sorted(allcov[chrom].keys()):
perc = ''
if (allcov[chrom][pos]['meth'] + allcov[chrom][pos]['unmeth'] == 0):
print("Both methylated and unmethylated positions were 0. Exiting...")
sys.exit()
else:
perc = allcov[chrom][pos]['meth'] / (allcov[chrom][pos]['meth'] + allcov[chrom][pos]['unmeth']) * 100

out.write(f"{chrom}\t{pos}\t{pos}\t{perc:.2f}\t{allcov[chrom][pos]['meth']}\t{allcov[chrom][pos]['unmeth']}\n")

print(f"All done! The merged coverage file '{basename}.cov.gz' has been created.\n")

if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Merge Bismark coverage files into a file called "basename.cov.gz".')
parser.add_argument('-b', '--basename', default='merged_coverage_file', help='The basename for the merged coverage file.')
args = parser.parse_args()
merge_coverage_files(args.basename)
Loading

0 comments on commit acf965c

Please sign in to comment.