Skip to content

Commit

Permalink
Merge pull request #2 from TRON-Bioinformatics/add-prefix
Browse files Browse the repository at this point in the history
Add --prefix + migrate to Nextflow DSL2
  • Loading branch information
priesgo authored Oct 13, 2021
2 parents 3fcbec5 + 9fce7a6 commit d06f72c
Show file tree
Hide file tree
Showing 11 changed files with 195 additions and 74 deletions.
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@ test:
echo "sample_2\t"`pwd`"/test_data/test_single_sample.vcf\t"`pwd`"/test_data/TESTX_S1_L001.bam,"`pwd`"/test_data/TESTX_S1_L002.bam\t"`pwd`"/test_data/TESTX_S1_L001.bam,"`pwd`"/test_data/TESTX_S1_L002.bam" >> test_data/test_input.txt
nextflow main.nf -profile test,conda --output output/test1 --input_files test_data/test_input.txt
nextflow main.nf -profile test,conda --output output/test2 --input_files test_data/test_input.txt --skip_multiallelic_filter
nextflow main.nf -profile test,conda --output output/test3 --input_files test_data/test_input.txt --prefix RNA --skip_multiallelic_filter


check:
test -s output/test1/sample_1/test_tumor_normal.vaf.vcf || { echo "Missing test 1 sample 1 output file!"; exit 1; }
test -s output/test1/sample_1/test_tumor_normal.vaf.filtered_multiallelics.vcf || { echo "Missing test 1 sample 1 output file!"; exit 1; }
test -s output/test1/sample_2/test_single_sample.vaf.vcf || { echo "Missing test 1 sample 2 output file!"; exit 1; }
test -s output/test1/sample_2/test_single_sample.vaf.filtered_multiallelics.vcf || { echo "Missing test 1 sample 1 output file!"; exit 1; }
test -s output/test1/sample_1/test_tumor_normal.vaf.vcf || { echo "Missing test 2 sample 1 output file!"; exit 1; }
test -s output/test1/sample_2/test_single_sample.vaf.vcf || { echo "Missing test 2 sample 2 output file!"; exit 1; }
test -s output/test2/sample_1/test_tumor_normal.vaf.vcf || { echo "Missing test 2 sample 1 output file!"; exit 1; }
test -s output/test2/sample_2/test_single_sample.vaf.vcf || { echo "Missing test 2 sample 2 output file!"; exit 1; }
test -s output/test3/sample_1/test_tumor_normal.vaf.vcf || { echo "Missing test 3 sample 1 output file!"; exit 1; }
test -s output/test3/sample_2/test_single_sample.vaf.vcf || { echo "Missing test 3 sample 2 output file!"; exit 1; }

32 changes: 24 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,33 @@
# VAFator

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5565744.svg)](https://doi.org/10.5281/zenodo.5565744)
[![PyPI version](https://badge.fury.io/py/vafator.svg)](https://badge.fury.io/py/vafator)
[![Run unit tests](https://github.com/TRON-Bioinformatics/vafator/actions/workflows/unit_tests.yml/badge.svg?branch=main)](https://github.com/TRON-Bioinformatics/vafator/actions/workflows/unit_tests.yml)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/vafator/badges/version.svg)](https://anaconda.org/bioconda/vafator)
[![Run unit tests](https://github.com/TRON-Bioinformatics/vafator/actions/workflows/unit_tests.yml/badge.svg?branch=master)](https://github.com/TRON-Bioinformatics/vafator/actions/workflows/unit_tests.yml)
[![License](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT)

Annotate the variants in a VCF file with allele frequencies, allele counts and depth of coverage for alternate alleles
from one or more BAM files.
Annotate the variants in a VCF file with technical annotations from one or more BAM files.

Install from PyPI (`pip install vafator`) or from bioconda (`conda install bioconda::vafator`).

Annotations:

* **Allele frequency (AF)**: ratio of reads supporting the alternate allele. When more than one alternate allele is present then one value per alternate allele is provided.
* **Allele count (AC)**: count of reads supporting the alternate allele. When more than one alternate allele is present then one value per alternate allele is provided.
* **Depth of coverage (DP)**: number of reads covering the position of the variant

Outputs a VCF with the following annotations in the INFO field for tumor and normal:
```
chr1 12345 . A G . PASS tumor_af=0.0;tumor_ac=0;tumor_dp=89;normal_af=0.0196078431372549;normal_ac=1;normal_dp=51
chr1 12345 . A G . PASS tumor_af=0.0;tumor_ac=0;tumor_dp=89;normal_af=0.0196;normal_ac=1;normal_dp=51
chr2 12345 . A G,T . PASS tumor_af=0.2,0.2;tumor_ac=2,2;tumor_dp=10;normal_af=0.0,0.0;normal_ac=0,0;normal_dp=10
```

Both tumor and normal BAMs are optional, it can annotate only with the tumor BAM and viceversa.
If more than one BAM is provided for either the tumor or the normal the frequencies and counts are added up.

If more than one BAM is provided for either the tumor or the normal then the annotations are calculated across all BAMs
and for also each of them separately (eg: `tumor_af` provides the allele frequency across all tumor BAMs, `tumor_af_1`
and `tumor_af_2` provide the allele frequency on the first and second BAM respectively).


Run it as follows:
```
Expand All @@ -23,8 +37,7 @@ usage: vafator [-h] --input-vcf INPUT_VCF --output-vcf OUTPUT_VCF
[--tumor-bams TUMOR_BAMS [TUMOR_BAMS ...]]
[--mapping-quality MAPPING_QUALITY]
[--base-call-quality BASE_CALL_QUALITY]
vafator v0.1.0
[--prefix BASE_CALL_QUALITY]
optional arguments:
-h, --help show this help message and exit
Expand All @@ -44,6 +57,9 @@ optional arguments:
--base-call-quality BASE_CALL_QUALITY
All bases with a base call quality lower or equal than
this threshold will be filtered out (default: 29)
--prefix PREFIX
When provided the annotations are preceded by this prefix, otherwise the annotations
are named as tumor_af, normal_af, tumor_ac, normal_ac, tumor_dp and normal_dp
Copyright (c) 2015-2020 TRON gGmbH (See LICENSE for licensing details)
Copyright (c) 2019-2021 TRON gGmbH (See LICENSE for licensing details)
```
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ channels:
- bioconda
- defaults
dependencies:
- bioconda::vafator=0.3.3
- bioconda::vafator=0.4.0
67 changes: 22 additions & 45 deletions main.nf
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
#!/usr/bin/env nextflow


nextflow.enable.dsl = 2

include { VAFATOR; MULTIALLELIC_FILTER } from './nf_modules/vafator'


params.help= false
params.input_files = false
params.output = "output"
params.mapping_quality = false
params.base_call_quality = false
params.skip_multiallelic_filter = false
params.prefix = false

if (params.help) {
log.info params.help_message
exit 0
}

publish_dir = "output"
if (params.output) {
publish_dir = params.output
}

// checks required inputs
if (params.input_files) {
Channel
Expand All @@ -22,45 +30,14 @@ if (params.input_files) {
exit 1, "Input file not specified!"
}

process vafator {
cpus params.cpus
memory params.memory
tag "${name}"
publishDir "${publish_dir}/${name}", mode: "copy"

input:
set name, file(vcf), normal_bams, tumor_bams from input_files

output:
set val("${name}"), file("${vcf.baseName}.vaf.vcf") into annotated_vcf

script:
normal_bams_param = normal_bams?.trim() ? "--normal-bams " + normal_bams.split(",").join(" ") : ""
tumor_bams_param = tumor_bams?.trim() ? "--tumor-bams " + tumor_bams.split(",").join(" ") : ""
mapping_quality_param = params.mapping_quality ? "--mapping-quality " + params.mapping_quality : ""
base_call_quality_param = params.base_call_quality ? "--base-call-quality " + params.base_call_quality : ""
"""
vafator --input-vcf ${vcf} --output-vcf ${vcf.baseName}.vaf.vcf ${normal_bams_param} ${tumor_bams_param} ${mapping_quality_param} ${base_call_quality_param}
"""
}

if (!params.skip_multiallelic_filter) {
process multiallelic_filter {
cpus params.cpus
memory params.memory
tag "${name}"
publishDir "${publish_dir}/${name}", mode: "copy"

input:
set name, file(vcf) from annotated_vcf

output:
set val("${name}"), val("${publish_dir}/${name}/${vcf.baseName}.filtered_multiallelics.vcf") into output_files
file("${vcf.baseName}.filtered_multiallelics.vcf") into filtered_vcf

script:
"""
multiallelics-filter --input-vcf ${vcf} --output-vcf ${vcf.baseName}.filtered_multiallelics.vcf
"""
workflow {
if (params.skip_multiallelic_filter) {
VAFATOR(
input_files)
}
else {
MULTIALLELIC_FILTER(
VAFATOR(
input_files))
}
}
9 changes: 1 addition & 8 deletions nextflow.config
Original file line number Diff line number Diff line change
@@ -1,13 +1,6 @@
params.cpus = 1
params.memory = "4g"

params.help= false
params.input_files = false
params.output = false
params.mapping_quality = false
params.base_call_quality = false
params.skip_multiallelic_filter = false

profiles {
conda { process.conda = "$baseDir/environment.yml" }
debug { process.beforeScript = 'echo $HOSTNAME' }
Expand All @@ -32,7 +25,7 @@ process.shell = ['/bin/bash', '-euo', 'pipefail']
cleanup = true
conda.createTimeout = '1 h'

VERSION = '0.3.4'
VERSION = '0.4.0'

manifest {
name = 'TRON-Bioinformatics/vafator'
Expand Down
52 changes: 52 additions & 0 deletions nf_modules/vafator.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
params.cpus = 1
params.memory = "4g"
params.output = ""
params.mapping_quality = false
params.base_call_quality = false
params.skip_multiallelic_filter = false
params.prefix = false


process VAFATOR {
cpus params.cpus
memory params.memory
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

input:
tuple val(name), file(vcf), val(normal_bams), val(tumor_bams)

output:
tuple val("${name}"), file("${vcf.baseName}.vaf.vcf"), emit: annotated_vcf

script:
normal_bams_param = params.normal_bams?.trim() ? "--normal-bams " + params.normal_bams.split(",").join(" ") : ""
tumor_bams_param = params.tumor_bams?.trim() ? "--tumor-bams " + params.tumor_bams.split(",").join(" ") : ""
mq_param = params.mapping_quality ? "--mapping-quality " + params.mapping_quality : ""
bq_param = params.base_call_quality ? "--base-call-quality " + params.base_call_quality : ""
prefix_param = params.prefix ? "--prefix " + params.prefix : ""
"""
vafator \
--input-vcf ${vcf} \
--output-vcf ${vcf.baseName}.vaf.vcf ${normal_bams_param} ${tumor_bams_param} ${mq_param} ${bq_param} ${prefix_param}
"""
}


process MULTIALLELIC_FILTER {
cpus params.cpus
memory params.memory
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

input:
tuple val(name), file(vcf)

output:
file("${vcf.baseName}.filtered_multiallelics.vcf") //, emit: filtered_vcf

script:
"""
multiallelics-filter --input-vcf ${vcf} --output-vcf ${vcf.baseName}.filtered_multiallelics.vcf
"""
}
2 changes: 1 addition & 1 deletion vafator/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
VERSION='0.3.4'
VERSION='0.4.0'
17 changes: 11 additions & 6 deletions vafator/annotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@ class Annotator(object):
}

def __init__(self, input_vcf, output_vcf, normal_bams=[], tumor_bams=[],
mapping_qual_thr=0, base_call_qual_thr=29):
mapping_qual_thr=0, base_call_qual_thr=29, prefix=None):
"""
:param input_vcf: the input VCF file
:param normal_bams: none or many normal BAM files
:param tumor_bams: none or many tumor BAM files
:param output_vcf: the file path of the output VCF
:param mapping_qual_thr: reads with a mapping quality lower or equal than this threshold will be filtered out
:param base_call_qual_thr: bases with a basecall quality lower than or equal this threshold will be filtered out
:params prefix: prefix for the annotations
"""
self.mapping_quality_threshold = mapping_qual_thr
self.base_call_quality_threshold = base_call_qual_thr
Expand All @@ -36,9 +37,13 @@ def __init__(self, input_vcf, output_vcf, normal_bams=[], tumor_bams=[],
self.vafator_header["tumor_bams"] = tumor_bams
self.vafator_header["mapping_quality_threshold"] = mapping_qual_thr
self.vafator_header["base_call_quality_threshold"] = base_call_qual_thr
self.vafator_header["prefix"] = prefix
self.vcf.add_to_header("##vafator_command_line={}".format(json.dumps(self.vafator_header)))
# adds to the header all the names of the annotations
for a in Annotator._get_headers("tumor", tumor_bams) + Annotator._get_headers("normal", normal_bams):
self.tumor_prefix = "{prefix}_tumor".format(prefix=prefix) if prefix is not None else "tumor"
self.normal_prefix = "{prefix}_normal".format(prefix=prefix) if prefix is not None else "normal"
for a in Annotator._get_headers(self.tumor_prefix, tumor_bams) + \
Annotator._get_headers(self.normal_prefix, normal_bams):
self.vcf.add_info_to_header(a)
self.vcf_writer = Writer(output_vcf, self.vcf)
self.tumor_bams = [pysam.AlignmentFile(b, "rb") for b in tumor_bams]
Expand All @@ -53,7 +58,7 @@ def _get_headers(prefix, bams):
'Description': "Allele frequency for the alternate alleles in the {} samples".format(prefix),
'Type': 'Float', 'Number': 'A'},
{'ID': "{}_dp".format(prefix),
'Description': "Total depth of coverage in the {} samples".format(prefix),
'Description': "Total depth of coverage in the {} samples (independent of alleles)".format(prefix),
'Type': 'Float', 'Number': 'A'},
{'ID': "{}_ac".format(prefix),
'Description': "Allele count for the alternate alleles in the {} samples".format(prefix),
Expand All @@ -67,7 +72,7 @@ def _get_headers(prefix, bams):
'Description': "Allele frequency for the alternate alleles in the {} sample {}".format(prefix, n),
'Type': 'Float', 'Number': 'A'},
{'ID': "{}_dp_{}".format(prefix, i),
'Description': "Depth of coverage in the {} sample {}".format(prefix, n),
'Description': "Depth of coverage in the {} sample {} (independent of alleles)".format(prefix, n),
'Type': 'Float', 'Number': 'A'},
{'ID': "{}_ac_{}".format(prefix, i),
'Description': "Allele count for the alternate alleles in the {} sample {}".format(prefix, n),
Expand Down Expand Up @@ -173,9 +178,9 @@ def run(self):
for variant in self.vcf:
# gets the counts of all bases across all BAMs
if self.tumor_bams:
self._add_stats(self.tumor_bams, variant, "tumor")
self._add_stats(self.tumor_bams, variant, self.tumor_prefix)
if self.normal_bams:
self._add_stats(self.normal_bams, variant, "normal")
self._add_stats(self.normal_bams, variant, self.normal_prefix)
batch.append(variant)
if len(batch) >= 1000:
self._write_batch(batch)
Expand Down
9 changes: 7 additions & 2 deletions vafator/command_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from vafator.multiallelic_filter import MultiallelicFilter


epilog = "Copyright (c) 2015-2020 TRON gGmbH (See LICENSE for licensing details)"
epilog = "Copyright (c) 2019-2021 TRON gGmbH (See LICENSE for licensing details)"


def annotator():
Expand All @@ -26,6 +26,9 @@ def annotator():
help="All reads with a mapping quality lower or equal than this threshold will be filtered out")
parser.add_argument("--base-call-quality", dest="base_call_quality", action="store", type=int, default=29,
help="All bases with a base call quality lower or equal than this threshold will be filtered out")
parser.add_argument("--prefix", dest="prefix", action="store", type=str, default=None,
help="When provided the annotations are preceded by this prefix, otherwise the annotations"
"are named as tumor_af, normal_af, tumor_ac, normal_ac, tumor_dp and normal_dp")

args = parser.parse_args()

Expand All @@ -37,7 +40,9 @@ def annotator():
normal_bams=args.normal_bams,
tumor_bams=args.tumor_bams,
mapping_qual_thr=args.mapping_quality,
base_call_qual_thr=args.base_call_quality)
base_call_qual_thr=args.base_call_quality,
prefix=args.prefix
)
annotator.run()
except Exception as e:
logging.error(str(e))
Expand Down
Loading

0 comments on commit d06f72c

Please sign in to comment.