Skip to content

Commit

Permalink
Merge pull request #15 from TRON-Bioinformatics/fix-input-issue
Browse files Browse the repository at this point in the history
Fix issue when --bam is not provided +  add automated integration tests
  • Loading branch information
priesgo authored Nov 19, 2021
2 parents feca54d + ae5c8ff commit ba34798
Show file tree
Hide file tree
Showing 14 changed files with 380 additions and 16 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/integration_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Integration tests

on: [push]

jobs:
test:
runs-on: ubuntu-20.04
strategy:
matrix:
python-version: [ 3.7, 3.8, 3.9 ]

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
channels: defaults,conda-forge,bioconda
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get --assume-yes install build-essential libcurl4-openssl-dev libz-dev liblzma-dev
python -m pip install --upgrade pip
pip install setuptools wheel
# this is needed by cyvcf2 and pysam
conda install htslib=1.14
- name: Install vafator
run: |
python setup.py bdist_wheel
pip install dist/*
- name: Run integration tests
run: |
make clean integration_tests
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Automated tests
name: Workflow tests

on: [push]

Expand Down
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,10 @@ check:
test -s output/test3/sample_1/test_tumor_normal.vaf.vcf || { echo "Missing test 3 sample 1 output file!"; exit 1; }
test -s output/test3/sample_2/test_single_sample.vaf.vcf || { echo "Missing test 3 sample 2 output file!"; exit 1; }

integration_tests:
bash tests/integration_test_00.sh
bash tests/integration_test_01.sh
bash tests/integration_test_02.sh
bash tests/integration_test_03.sh


28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,16 @@ Annotations:
* **Allele count (AC)**: count of reads supporting the alternate allele.
* **Depth of coverage (DP)**: number of reads covering the position of the variant

Outputs a VCF with the following annotations in the INFO field for tumor and normal:
```
chr1 12345 . A G . PASS tumor_af=0.0;tumor_ac=0;tumor_dp=89;normal_af=0.0196;normal_ac=1;normal_dp=51
chr2 12345 . A G,T . PASS tumor_af=0.2,0.2;tumor_ac=2,2;tumor_dp=10;normal_af=0.0,0.0;normal_ac=0,0;normal_dp=10
```

**NOTE**: notice that VAFator does not annotate samples in the FORMAT field


## How to install

Install from PyPI (`pip install vafator`) or from bioconda (`conda install bioconda::vafator`).
Install from PyPI (`pip install vafator`) or from bioconda (`conda install bioconda::vafator`).

When installaing from PyPI there are some system dependencies that will need to be met:
* libcurl
* libz
* liblzma
* htslib=1.14

## How to run

Expand All @@ -45,7 +42,7 @@ This will add annotations for each of the three samples `normal`, `primary` and
`normal_dp`, `normal_af`, `primary_ac`, `primary_dp`, `primary_af`,
`metastasis_ac`, `metastasis_dp` and `metastasis_af`.

If more than one BAM is provided for any sample then the annotations are calculated across all BAMs
If more than one BAM for the same sample is provided then the annotations are calculated across all BAMs
and for also each of them separately (eg: `primary_af` provides the allele frequency across all primary tumor BAMs,
`primary_af_1` and `primary_af_2` provide the allele frequency on the first and second BAM respectively).

Expand All @@ -71,6 +68,17 @@ All reads with quality values velow these thresholds will be filtered out.

Overlapping reads from read pairs are not double counted. The read with the highest base call quality is chosen.

## Understanding the output

The output is a VCF with the some new annotations in the INFO field for the provided sample names.
The example below contains vafator annotations for two samples named `normal` and `tumor`.
```
chr1 12345 . A G . PASS tumor_af=0.0;tumor_ac=0;tumor_dp=89;normal_af=0.0196;normal_ac=1;normal_dp=51
chr2 12345 . A G,T . PASS tumor_af=0.2,0.2;tumor_ac=2,2;tumor_dp=10;normal_af=0.0,0.0;normal_ac=0,0;normal_dp=10
```

**NOTE**: notice that VAFator does not annotate samples in the FORMAT field, but in the INFO field

## Filter for multi-allelic variants

Multi-allelic variants are those that have more than one alternative allele (e.g.: A>C,G).
Expand Down
2 changes: 1 addition & 1 deletion nf_modules/vafator.nf
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ process VAFATOR {
tag "${name}"
publishDir "${params.output}/${name}", mode: "copy"

conda (params.enable_conda ? "bioconda::vafator=1.1.0" : null)
conda (params.enable_conda ? "bioconda::vafator=1.1.1" : null)

input:
tuple val(name), file(vcf), val(normal_bams), val(tumor_bams)
Expand Down
248 changes: 248 additions & 0 deletions tests/assert.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
#!/usr/bin/env bash

#####################################################################
##
## title: Assert Extension
##
## description:
## Assert extension of shell (bash, ...)
## with the common assert functions
## Function list based on:
## http://junit.sourceforge.net/javadoc/org/junit/Assert.html
## Log methods : inspired by
## - https://natelandau.com/bash-scripting-utilities/
## author: Mark Torok
##
## date: 07. Dec. 2016
##
## license: MIT
##
#####################################################################

if command -v tput &>/dev/null && tty -s; then
RED=$(tput setaf 1)
GREEN=$(tput setaf 2)
MAGENTA=$(tput setaf 5)
NORMAL=$(tput sgr0)
BOLD=$(tput bold)
else
RED=$(echo -en "\e[31m")
GREEN=$(echo -en "\e[32m")
MAGENTA=$(echo -en "\e[35m")
NORMAL=$(echo -en "\e[00m")
BOLD=$(echo -en "\e[01m")
fi

log_header() {
printf "\n${BOLD}${MAGENTA}========== %s ==========${NORMAL}\n" "$@" >&2
}

log_success() {
printf "${GREEN}✔ %s${NORMAL}\n" "$@" >&2
}

log_failure() {
printf "${RED}✖ %s${NORMAL}\n" "$@" >&2
}


assert_eq() {
local expected="$1"
local actual="$2"
local msg="${3-}"

if [ "$expected" == "$actual" ]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$expected == $actual :: $msg" || true
return 1
fi
}

assert_not_eq() {
local expected="$1"
local actual="$2"
local msg="${3-}"

if [ ! "$expected" == "$actual" ]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$expected != $actual :: $msg" || true
return 1
fi
}

assert_true() {
local actual="$1"
local msg="${2-}"

assert_eq true "$actual" "$msg"
return "$?"
}

assert_false() {
local actual="$1"
local msg="${2-}"

assert_eq false "$actual" "$msg"
return "$?"
}

assert_array_eq() {

declare -a expected=("${!1-}")
# echo "AAE ${expected[@]}"

declare -a actual=("${!2}")
# echo "AAE ${actual[@]}"

local msg="${3-}"

local return_code=0
if [ ! "${#expected[@]}" == "${#actual[@]}" ]; then
return_code=1
fi

local i
for (( i=1; i < ${#expected[@]} + 1; i+=1 )); do
if [ ! "${expected[$i-1]}" == "${actual[$i-1]}" ]; then
return_code=1
break
fi
done

if [ "$return_code" == 1 ]; then
[ "${#msg}" -gt 0 ] && log_failure "(${expected[*]}) != (${actual[*]}) :: $msg" || true
fi

return "$return_code"
}

assert_array_not_eq() {

declare -a expected=("${!1-}")
declare -a actual=("${!2}")

local msg="${3-}"

local return_code=1
if [ ! "${#expected[@]}" == "${#actual[@]}" ]; then
return_code=0
fi

local i
for (( i=1; i < ${#expected[@]} + 1; i+=1 )); do
if [ ! "${expected[$i-1]}" == "${actual[$i-1]}" ]; then
return_code=0
break
fi
done

if [ "$return_code" == 1 ]; then
[ "${#msg}" -gt 0 ] && log_failure "(${expected[*]}) == (${actual[*]}) :: $msg" || true
fi

return "$return_code"
}

assert_empty() {
local actual=$1
local msg="${2-}"

assert_eq "" "$actual" "$msg"
return "$?"
}

assert_not_empty() {
local actual=$1
local msg="${2-}"

assert_not_eq "" "$actual" "$msg"
return "$?"
}

assert_contain() {
local haystack="$1"
local needle="${2-}"
local msg="${3-}"

if [ -z "${needle:+x}" ]; then
return 0;
fi

if [ -z "${haystack##*$needle*}" ]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$haystack doesn't contain $needle :: $msg" || true
return 1
fi
}

assert_not_contain() {
local haystack="$1"
local needle="${2-}"
local msg="${3-}"

if [ -z "${needle:+x}" ]; then
return 0;
fi

if [ "${haystack##*$needle*}" ]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$haystack contains $needle :: $msg" || true
return 1
fi
}

assert_gt() {
local first="$1"
local second="$2"
local msg="${3-}"

if [[ "$first" -gt "$second" ]]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$first > $second :: $msg" || true
return 1
fi
}

assert_ge() {
local first="$1"
local second="$2"
local msg="${3-}"

if [[ "$first" -ge "$second" ]]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$first >= $second :: $msg" || true
return 1
fi
}

assert_lt() {
local first="$1"
local second="$2"
local msg="${3-}"

if [[ "$first" -lt "$second" ]]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$first < $second :: $msg" || true
return 1
fi
}

assert_le() {
local first="$1"
local second="$2"
local msg="${3-}"

if [[ "$first" -le "$second" ]]; then
return 0
else
[ "${#msg}" -gt 0 ] && log_failure "$first <= $second :: $msg" || true
return 1
fi
}
4 changes: 4 additions & 0 deletions tests/integration_test_00.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash


vafator --help
21 changes: 21 additions & 0 deletions tests/integration_test_01.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash


source tests/assert.sh
test_data=vafator/tests/resources
output=output/integration_test_01
input_vcf=$test_data/project.NIST.hc.snps.indels.chr1_1000000_2000000.vcf
output_vcf=$output/vafator.vcf
mkdir -p $output
vafator --input-vcf $input_vcf \
--output-vcf $output_vcf \
--bam my_normal $test_data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.chr1_1000000_2000000.bam \
--bam my_tumor $test_data/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.chr1_1000000_2000000.bam

test -s $output/vafator.vcf || { echo "Missing VCF output file!"; exit 1; }
assert_eq `cat $output_vcf | grep -v '#' | grep my_normal_af | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
assert_eq `cat $output_vcf | grep -v '#' | grep my_tumor_af | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
assert_eq `cat $output_vcf | grep -v '#' | grep my_normal_ac | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
assert_eq `cat $output_vcf | grep -v '#' | grep my_tumor_ac | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
assert_eq `cat $output_vcf | grep -v '#' | grep my_normal_dp | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
assert_eq `cat $output_vcf | grep -v '#' | grep my_tumor_dp | wc -l` `cat $input_vcf | grep -v '#' | wc -l` "Wrong number of variants"
Loading

0 comments on commit ba34798

Please sign in to comment.