Skip to content

Commit

Permalink
Merge pull request #107 from CCBR/test-data
Browse files Browse the repository at this point in the history
Uploaded new test dataset for github workflow
  • Loading branch information
samarth8392 authored Aug 16, 2024
2 parents 06debe7 + c9d6592 commit 315ef23
Show file tree
Hide file tree
Showing 26 changed files with 65 additions and 49 deletions.
48 changes: 20 additions & 28 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,71 +23,63 @@ jobs:
run: |
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.R1.fastq.gz /opt2/.tests/Sample10_ARK1_S37.R2.fastq.gz \
/opt2/.tests/Sample11_ACI_158_S38.R1.fastq.gz /opt2/.tests/Sample11_ACI_158_S38.R2.fastq.gz \
/opt2/.tests/Sample4_CRL1622_S31.R1.fastq.gz /opt2/.tests/Sample4_CRL1622_S31.R2.fastq.gz \
/opt2/tests/data/WES_NC_N_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_N_1_sub.R2.fastq.gz \
/opt2/tests/data/WES_NC_T_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_T_1_sub.R2.fastq.gz \
--output /opt2/output_tn_fqs --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--pairs /opt2/.tests/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode init
--pairs /opt2/tests/data/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode init
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.R1.fastq.gz /opt2/.tests/Sample10_ARK1_S37.R2.fastq.gz \
/opt2/.tests/Sample11_ACI_158_S38.R1.fastq.gz /opt2/.tests/Sample11_ACI_158_S38.R2.fastq.gz \
/opt2/.tests/Sample4_CRL1622_S31.R1.fastq.gz /opt2/.tests/Sample4_CRL1622_S31.R2.fastq.gz \
/opt2/tests/data/WES_NC_N_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_N_1_sub.R2.fastq.gz \
/opt2/tests/data/WES_NC_T_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_T_1_sub.R2.fastq.gz \
--output /opt2/output_tn_fqs --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--pairs /opt2/.tests/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode dryrun
--pairs /opt2/tests/data/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode dryrun
- name: Tumor-only FastQ Dry Run
run: |
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.R1.fastq.gz /opt2/.tests/Sample10_ARK1_S37.R2.fastq.gz \
/opt2/.tests/Sample11_ACI_158_S38.R1.fastq.gz /opt2/.tests/Sample11_ACI_158_S38.R2.fastq.gz \
/opt2/.tests/Sample4_CRL1622_S31.R1.fastq.gz /opt2/.tests/Sample4_CRL1622_S31.R2.fastq.gz \
/opt2/tests/data/WES_NC_N_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_N_1_sub.R2.fastq.gz \
/opt2/tests/data/WES_NC_T_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_T_1_sub.R2.fastq.gz \
--output /opt2/output_tonly_fqs --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--genome hg38 --mode local --ffpe --runmode init
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.R1.fastq.gz /opt2/.tests/Sample10_ARK1_S37.R2.fastq.gz \
/opt2/.tests/Sample11_ACI_158_S38.R1.fastq.gz /opt2/.tests/Sample11_ACI_158_S38.R2.fastq.gz \
/opt2/.tests/Sample4_CRL1622_S31.R1.fastq.gz /opt2/.tests/Sample4_CRL1622_S31.R2.fastq.gz \
/opt2/tests/data/WES_NC_N_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_N_1_sub.R2.fastq.gz \
/opt2/tests/data/WES_NC_T_1_sub.R1.fastq.gz /opt2/tests/data/WES_NC_T_1_sub.R2.fastq.gz \
--output /opt2/output_tonly_fqs --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--genome hg38 --mode local --ffpe --runmode dryrun
- name: Tumor-normal BAM Dry Run
run: |
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.recal.bam \
/opt2/.tests/Sample11_ACI_158_S38.recal.bam \
/opt2/.tests/Sample4_CRL1622_S31.recal.bam \
/opt2/tests/data/WES_NC_N_1_sub.bam \
/opt2/tests/data/WES_NC_T_1_sub.bam \
--output /opt2/output_tn_bams --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--pairs /opt2/.tests/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode init
--pairs /opt2/tests/data/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode init
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.recal.bam \
/opt2/.tests/Sample11_ACI_158_S38.recal.bam \
/opt2/.tests/Sample4_CRL1622_S31.recal.bam \
/opt2/tests/data/WES_NC_N_1_sub.bam \
/opt2/tests/data/WES_NC_T_1_sub.bam \
--output /opt2/output_tn_bams --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--pairs /opt2/.tests/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode dryrun
--pairs /opt2/tests/data/pairs.tsv --genome hg38 --mode local --ffpe --cnv --runmode dryrun
- name: Tumor-only BAM Dry Run
run: |
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.recal.bam \
/opt2/.tests/Sample11_ACI_158_S38.recal.bam \
/opt2/.tests/Sample4_CRL1622_S31.recal.bam \
/opt2/tests/data/WES_NC_N_1_sub.bam \
/opt2/tests/data/WES_NC_T_1_sub.bam \
--output /opt2/output_tonly_bams --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--genome hg38 --mode local --ffpe --runmode init
docker run -v $PWD:/opt2 snakemake/snakemake:v7.32.4 \
/opt2/bin/xavier run --input \
/opt2/.tests/Sample10_ARK1_S37.recal.bam \
/opt2/.tests/Sample11_ACI_158_S38.recal.bam \
/opt2/.tests/Sample4_CRL1622_S31.recal.bam \
/opt2/tests/data/WES_NC_N_1_sub.bam \
/opt2/tests/data/WES_NC_T_1_sub.bam \
--output /opt2/output_tonly_bams --targets /opt2/resources/Agilent_SSv7_allExons_hg38.bed \
--genome hg38 --mode local --ffpe --runmode dryrun
Expand Down
5 changes: 0 additions & 5 deletions .tests/README.md

This file was deleted.

Empty file.
Empty file.
Empty file removed .tests/Sample10_ARK1_S37.recal.bam
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
3 changes: 0 additions & 3 deletions .tests/pairs.tsv

This file was deleted.

1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- Provide default exome targets for hg38 and mm10, which can be overridden by the optional `--targets` argument. (#102, @kelly-sovacool)
- Previously, the `--targets` argument was required with no defaults.
- Increased memory for rules: BWA mem, qualimap, kraken. gatk_contamination is not localrule. (#89, @samarth8392)
- Added new human test dataset for github workflow (#27, @samarth8392)

## XAVIER 3.0.3

Expand Down
17 changes: 13 additions & 4 deletions docs/usage/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ Each of the following arguments are required. Failure to provide a required argu
>
> One or more FastQ files can be provided. The pipeline does NOT support single-end WES data. Please provide either a set of FastQ files or a set of BAM files. The pipeline does NOT support processing a mixture of FastQ files and BAM files. From the command-line, each input file should separated by a space. Globbing is supported! This makes selecting FastQ files easy. Input FastQ files should be gzipp-ed.
>
> **_Example:_** `--input .tests/*.R?.fastq.gz`
> **_Example:_** `--input tests/data/*.R?.fastq.gz`
>
> **_Example:_** `--input /data/CCBR_Pipeliner/testdata/XAVIER/human_subset/*.R?.fastq.gz`
---

Expand Down Expand Up @@ -251,15 +253,15 @@ module purge
module load ccbrpipeliner

# Step 2A.) Initialize the all resources to the output folder
xavier run --input .tests/*.R?.fastq.gz \
xavier run --input tests/data/*.R?.fastq.gz \
--output /data/$USER/xavier_hg38 \
--genome hg38 \
--targets Agilent_SSv7_allExons_hg38.bed \
--mode slurm \
--runmode init

# Step 2B.) Dry-run the pipeline
xavier run --input .tests/*.R?.fastq.gz \
xavier run --input tests/data/*.R?.fastq.gz \
--output /data/$USER/xavier_hg38 \
--genome hg38 \
--targets Agilent_SSv7_allExons_hg38.bed \
Expand All @@ -269,11 +271,18 @@ xavier run --input .tests/*.R?.fastq.gz \
# Step 2C.) Run the XAVIER pipeline
# The slurm mode will submit jobs to the cluster.
# It is recommended running xavier in this mode.
xavier run --input .tests/*.R?.fastq.gz \
xavier run --input tests/data/*.R?.fastq.gz \
--output /data/$USER/xavier_hg38 \
--genome hg38 \
--targets Agilent_SSv7_allExons_hg38.bed \
--mode slurm \
--runmode run

```

The example dataset in `tests/data` in this repository is a very small
subsampled dataset, and some steps of the pipeline fail due to the small size
(CNV callling, somalier, etc).
We have a larger subsample (25% of a full human dataset) available on Biowulf if
you would like to test the full functionality of the pipeline:
`/data/CCBR_Pipeliner/testdata/XAVIER/human_subset/*.R?.fastq.gz`
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Repository = "https://github.com/CCBR/XAVIER"
xavier = "."

[tool.setuptools.package-data]
"*" = ["CITATION.cff", "LICENSE", "VERSION", "docker/**", "resources/**", "bin/**", "config/**", "resources/**", "workflow/**", "tests/**", ".tests/**"]
"*" = ["CITATION.cff", "LICENSE", "VERSION", "docker/**", "resources/**", "bin/**", "config/**", "resources/**", "workflow/**", "tests/**"]

[tool.setuptools.dynamic]
version = {file = "VERSION"}
Expand Down
8 changes: 4 additions & 4 deletions src/xavier/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ def parsed_arguments():
FastQ files or a set of BAM files. The pipeline does
NOT support processing a mixture of FastQ files and
BAM files.
Example: --input .tests/*.R?.fastq.gz
Example: --input tests/data/*.R?.fastq.gz
--output OUTPUT
Path to an output directory. This location is where
the pipeline will create all of its output files, also
Expand Down Expand Up @@ -256,15 +256,15 @@ def parsed_arguments():
# Step 2A.) Initialize the pipeline
xavier run \\
--runmode init \\
--input .tests/*.R?.fastq.gz \\
--input tests/data/*.R?.fastq.gz \\
--output /data/$USER/xavier_hg38 \\
--genome hg38 \\
--targets resources/Agilent_SSv7_allExons_hg38.bed
# Step 2B.) Dry-run the pipeline
xavier run \\
--runmode dryrun \\
--input .tests/*.R?.fastq.gz \\
--input tests/data/*.R?.fastq.gz \\
--output /data/$USER/xavier_hg38 \\
--genome hg38 \\
--targets resources/Agilent_SSv7_allExons_hg38.bed \\
Expand All @@ -275,7 +275,7 @@ def parsed_arguments():
# It is recommended running xavier in this mode.
xavier run \\
--runmode run \\
--input .tests/*.R?.fastq.gz \\
--input tests/data/*.R?.fastq.gz \\
--output /data/$USER/xavier_hg38 \\
--genome hg38 \\
--targets resources/Agilent_SSv7_allExons_hg38.bed \\
Expand Down
20 changes: 20 additions & 0 deletions tests/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# About

These input files are used for continuous integration purposes, specifically to dry run the pipeline whenever commits have been made to the main, master, or unified branches.

Human whole exome sequence reads from the Sequencing Quality Control Phase 2 (SEQC2) Consortium has been subsampled and added.

The tumor-normal paired reads were downloaded from the [seqc2](https://sites.google.com/view/seqc2/home/sequencing) server that were sequenced by the NCI (WES_NC_T_1 vs. WES_NC_N_1) which corresponds to NCBI SRA accession no. [SRX4728524](https://www.ncbi.nlm.nih.gov/sra/SRX4728524) and [SRX4728523](https://www.ncbi.nlm.nih.gov/sra/SRX4728523) respectively.

Next, the reads were subsampled to 0.1% using `seqtk` and gzipped as follows:

```bash
seqtk sample -s100 {input}.R[1/2].fastq.gz 0.001 > {input}.R[1/2]_sub.R2.fastq
gzip *.fastq
```

Similarly, the BAM files were created by first mapping to the hg38 genome and then subsampled using `samtools`:

```bash
samtools view -s 0.00125 -b WES_NC_[T/N]_1.bam -o WES_NC_[T/N]_1_sub.bam
```
Binary file added tests/data/WES_NC_N_1_sub.R1.fastq.gz
Binary file not shown.
Binary file added tests/data/WES_NC_N_1_sub.R2.fastq.gz
Binary file not shown.
Binary file added tests/data/WES_NC_N_1_sub.bam
Binary file not shown.
Binary file added tests/data/WES_NC_T_1_sub.R1.fastq.gz
Binary file not shown.
Binary file added tests/data/WES_NC_T_1_sub.R2.fastq.gz
Binary file not shown.
Binary file added tests/data/WES_NC_T_1_sub.bam
Binary file not shown.
2 changes: 2 additions & 0 deletions tests/data/pairs.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Normal Tumor
WES_NC_N_1_sub WES_NC_T_1_sub
4 changes: 2 additions & 2 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@

xavier_run = (
"xavier run "
"--input .tests/*.fastq.gz "
"--pairs .tests/pairs.tsv "
"--input tests/data/*.fastq.gz "
"--pairs tests/data/pairs.tsv "
"--mode local "
)

Expand Down
4 changes: 2 additions & 2 deletions tests/test_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ def test_dryrun():
with tempfile.TemporaryDirectory() as tmp_dir:
run_args = argparse.Namespace(
runmode="init",
input=list(glob.glob(f"{xavier_base('.tests')}/*.fastq.gz")),
input=list(glob.glob(f"{xavier_base('tests/data')}/*.fastq.gz")),
output=tmp_dir,
genome="hg38",
targets=xavier_base("resources/Agilent_SSv7_allExons_hg38.bed"),
mode="local",
job_name="pl:xavier",
callers=["mutect2", "mutect", "strelka", "vardict", "varscan"],
pairs=xavier_base(".tests/pairs.tsv"),
pairs=xavier_base("tests/data/pairs.tsv"),
ffpe=False,
cnv=False,
wait=False,
Expand Down

0 comments on commit 315ef23

Please sign in to comment.