Snakemake based pipeline to perform WGS joint-calling, annotation, and GWAS for the genMARK project. Pipeline takes GVCFs as inputs; does not perform mapping, individual variant calling (haplotyper).
Input: TSV file with sampleID, path to GVCF files
- Combine GVCFs (genomicsdb)
- Joint-genotype (GenotypeGVCF)
- Filter variants on qual and depth
- Annotate variants (vep)
- Convert the joint-called VCF to Plink BED format (Plink v1.9 & 2)
- Quality control of sample and variants
- PCA and population stratification
- Add phenotype information
- Association testing and results (manhattan plots and significant variants report)
- Validation of results and prioritization
- Download and install mamba
- mamba create -c conda-forge -c bioconda -n snakemake8.11 snakemake=8.11.3
- mamba activate snakemake8.11
- Install executor plugins required for slurm submission: 4.1. pip install snakemake-executor-plugin-slurm 4.2. pip install snakemake-storage-plugin-fs
- git clone this repo
- make a directory with more space to create tool-sepcific conda environments
- Edit the paths in genmark.sh according to your system; run the code listed as 1 and 2 for testing and setup
- Add the input sampleid, gvcf paths to
config/samples.tsv
- edit
config/config.yaml
with project names and any other settings as required - use the "dry-run" line in
workflow/genmark.sh
to test - use the "run pipeline" in
workflow/genmark.sh
to submit jobs.
- config: Snakemake config files
- data: folder with test data
- workflow: main folder with rules, wrappers, Snakefile, envs, and dependent scripts.
- notebooks: Jupyter notebook used for REDCap files (input csv files should not be uploaded to github and listed in .gitignore; )