This repository will make up the evidence for the Bioinformatics Software and Tools module of the ARU/Sanger BSc Bioinformatics Year 2 Course.
This can be achieved through the use of:
conda create --file environment.yml
However, this can take quite some time and personally was faster installing it all separately.
Local Machine Usage 1 - Clone repo
2 - cd annotation-pipeline
3 - Download data into the raw_data folders (links below).
5 - snakemake --configfile config.yaml --cores 10
This pipeline will require the file to have been set to 15gb. (-Xmx15g compared to -Xmx4g) this is in order to produce the .genome file.
If this can't be used then please use the built-in database for Human (hg38).
snpEff is set to -Xmx4g in order to annotate the vcf.
After completion:
6 - Run IGV
In my case this was bash {location of installation}/
7 - OPTION A - build .genome file
Genomes > Create .genome
The fasta file = the reference genome
The Gene file is the .gff file downloaded earlier
7 - OPTION B - use pre-installed Human (hg38)
8 - Load annotated file
File > load from file
Navigate to folder s13 witch should contain something akin to:
9 - Navigate to location chr10:94,760,000-94,860,000
which will centre on the gene CYP2C19.
snakemake --configfile config.yaml --cores 10 --cluster-config
./cluster.yaml --cluster "bsub -q {cluster.queue} -oo {cluster.output}
-eo {cluster.error} -M {cluster.memory} -R {cluster.resources} -J {cluster.jobname}"
-j 10 --use-conda
environment.yml - a list of packages used in this project
Files should be downloaded into a {project dir}/raw_data/{ reference | sample_data }/{Downloaded file}
Sample Data comes from the Utah family platinum read set.
Mapped against GRCH38.p15:
We also used:
This was used in conjunction with the reference genome to build a .genome file to better compare the results of this pipeline when visualising the end product with IGV.