Here is a quick demo for the Illumina NGS tumor-only somatic variant calling using HCC1395 tumor sample chromosome 17 data. The data was acquired from SEQC2.
Platform: Illumina
Sample: HCC1395 tumor sample
Tumor coverage: ~50x
Reference: GRCh38
Aligner: BWA-MEM
Region: chr17:80000000-80100000
Instruments: NovaSeq 6000
# Parameters
INPUT_DIR="${HOME}/illumina_quick_demo"
OUTPUT_DIR="${INPUT_DIR}/output"
mkdir -p ${INPUT_DIR}
mkdir -p ${OUTPUT_DIR}
# Download quick demo data
# GRCh38 reference
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/GRCh38_chr17.fa
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/GRCh38_chr17.fa.fai
# Tumor BAM
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/HCC1395_tumor_chr17_demo.bam
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/HCC1395_tumor_chr17_demo.bam.bai
# SEQC2 Truth VCF and BED
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz.tbi
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/ilmn/SEQC2_High-Confidence_Regions_v1.2_chr17.bed
REF="GRCh38_chr17.fa"
TUMOR_BAM="HCC1395_tumor_chr17_demo.bam"
BASELINE_VCF_FILE_PATH="SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz"
BASELINE_BED_FILE_PATH="SEQC2_High-Confidence_Regions_v1.2_chr17.bed"
OUTPUT_SNV_VCF_FILE_PATH="snv.vcf.gz"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clairs-to:latest \
/opt/bin/run_clairs_to \
--tumor_bam_fn ${INPUT_DIR}/${TUMOR_BAM} \
--ref_fn ${INPUT_DIR}/${REF} \
--threads 4 \
--platform ilmn \
--output_dir ${OUTPUT_DIR} \
--region chr17:80000000-80100000
Run compare_vcf.py for benchmarking (optional)
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clairs-to:latest \
python3 /opt/bin/clairs_to.py compare_vcf \
--truth_vcf_fn ${INPUT_DIR}/${BASELINE_VCF_FILE_PATH} \
--input_vcf_fn ${OUTPUT_DIR}/${OUTPUT_SNV_VCF_FILE_PATH} \
--bed_fn ${INPUT_DIR}/${BASELINE_BED_FILE_PATH} \
--output_dir ${OUTPUT_DIR}/benchmark \
--input_filter_tag 'PASS' \
--ctg_name chr17 \
--ctg_start 80000000 \
--ctg_end 80100000
Expected output:
Type | Precision | Recall | F1-score | TP | FP | FN |
---|---|---|---|---|---|---|
SNV | 0.9667 | 1.0 | 0.9831 | 29 | 1 | 0 |
Or run som.py for benchmarking (optional)
# Need to restrict target BED regions for benchmarking
echo -e "chr17\t80000000\t80100000" > ${INPUT_DIR}/quick_demo.bed
docker run \
-v "${INPUT_DIR}":"${INPUT_DIR}" \
-v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \
jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/som.py \
${INPUT_DIR}/${BASELINE_VCF_FILE_PATH} \
${OUTPUT_DIR}/${OUTPUT_SNV_VCF_FILE_PATH} \
-T ${INPUT_DIR}/quick_demo.bed \
-f ${INPUT_DIR}/${BASELINE_BED_FILE_PATH} \
-r ${INPUT_DIR}/${REF} \
-o "${OUTPUT_DIR}/som" \
-l chr17
Run all commands above:
cd ${HOME}
wget "https://raw.githubusercontent.com/HKU-BAL/clairs-to/main/demo/ilmn_quick_demo.sh"
chmod +x ilmn_quick_demo.sh
./ilmn_quick_demo.sh
Check the results using less ${HOME}/illumina_quick_demo/output/snv.vcf.gz
.