Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crg2 hg38 #163

Open
wants to merge 156 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
39d40e4
changed the decoy labels to match new hg38 ones
NourHanafi Jan 14, 2022
fefa3cb
fixed a typo
NourHanafi Jan 14, 2022
3ab919b
doubled number of threads for bwa
NourHanafi Jan 14, 2022
a1c5997
switched vcfanno files/annotations to hg38
NourHanafi Jan 19, 2022
832227b
updated dbnsfp columns
NourHanafi Feb 28, 2022
5233f77
changed vcfanno operations from first to self
NourHanafi Mar 10, 2022
1757595
fixed fastqc bug
Jan 19, 2022
1fc31ca
cleared commented lines
Jan 19, 2022
ca42385
add multiqc and md5sum to exome output
Madelinehazel Jan 21, 2022
3effad3
add --DisableSanityCheck to verifybamID2
Madelinehazel Jan 24, 2022
163b1c8
add parser for exome processing
Madelinehazel Jan 24, 2022
0b8ba3e
search upload and result dirs with improperly formatted names
Madelinehazel Jan 25, 2022
28ea92e
incorporate bcftools statistics into multiqc report
Jan 26, 2022
693c787
add peddy wrapper, still in progress
Jan 26, 2022
93985c5
deleted unwanted lines in bcftools/view wrapper
Jan 27, 2022
1637788
add peddy to multiqc report
Jan 31, 2022
01af3a6
fixed peddy bug
Jan 31, 2022
8f93452
GATK MQ filter changed to <30 for genomes
anjalijain22 Jan 31, 2022
38ee8d2
fix qualimap
Feb 1, 2022
a020f90
fixed qualimap
Feb 2, 2022
5f8a63f
changed memory usage for peddy
Feb 3, 2022
1edba7d
fix md5sum
Madelinehazel Feb 3, 2022
cf7fddf
search ccmbio_ephemeral for inputs
Madelinehazel Feb 4, 2022
466ad02
fix bcftools view bug
pamelaxu213 Feb 7, 2022
f3ec3ee
added a sort functionality so that fastqs will be concatenated in order
NourHanafi Feb 7, 2022
e4287bb
made fastq.gz wildcard more flexible to accept alternatively named fa…
NourHanafi Feb 8, 2022
2256003
add multiqc config file and fix scripts
pamelaxu213 Feb 11, 2022
4cdd304
switch to original exon bed for svreport
pamelaxu213 Feb 11, 2022
3dbaf27
made the fastq globbing more flexible so that it also recognizes fast…
NourHanafi Feb 16, 2022
6513b4e
made fq1=/=fq2 error message easier to understand
NourHanafi Feb 16, 2022
17d484c
fixed an issue with the fastq globbing
NourHanafi Feb 16, 2022
fa8f111
if family analysis directory exists, do not overwrite
Madelinehazel Feb 24, 2022
5839b6f
uncomment submit
Madelinehazel Feb 24, 2022
8e3230d
useful scripts for uploading/moving data and making bam slices
Madelinehazel Feb 25, 2022
d80bb00
add sorting step to script
pamelaxu213 Mar 8, 2022
5bdd1bf
added annot as a module, so that pipeline can run starting from input…
NourHanafi Mar 8, 2022
02a1fa4
added annot as a pipeline option, so that pipeline can run starting f…
NourHanafi Mar 8, 2022
0f6bf05
bug fix for str reports created in 2022
Madelinehazel Mar 9, 2022
7eba8b5
small fix in multiqc
pamelaxu213 Mar 11, 2022
98cf3b0
flag to delete temp files
pamelaxu213 Mar 16, 2022
2713df2
fix_ped
pamelaxu213 Mar 21, 2022
61f2a82
document R package requirements for str reports
Madelinehazel Mar 22, 2022
dfc8570
changed the paths of some annotation databases to the hg38 versions
NourHanafi Mar 22, 2022
11ce057
changed vest3 to vest4 (most updated version)
NourHanafi Mar 30, 2022
2011449
upgraded gatk to v.4.2
NourHanafi May 25, 2022
f659d81
added dragen-gatk as an option to run vs. gatk or gatk3
NourHanafi May 25, 2022
75f0406
commented out dragstr for testing with dragen-mode only
NourHanafi May 28, 2022
6c43afb
added in hg38 C4R genome counts
NourHanafi May 28, 2022
961daa3
added dragen calling to exome pipeline
NourHanafi May 28, 2022
39ac9f5
added hg38 snpeff data directory path
NourHanafi Jun 10, 2022
7f974dd
set gene names in Biomart file to string explicitly
NourHanafi Jun 17, 2022
d5b6bc9
added hg38 svscore files
NourHanafi Jun 17, 2022
854ac90
added print statements for troubleshooting + specified hg38 for AnnotSV
NourHanafi Jul 6, 2022
732644b
SV report fixes
Madelinehazel Jul 20, 2022
90dd99c
added hg38 SV annotation files and changed MSSNG file names
NourHanafi Aug 3, 2022
b58f941
changed hg38 reference from Illumina to GIAB one
NourHanafi Aug 4, 2022
c9abe4d
removed dragen calling from the exome workflow
NourHanafi Aug 4, 2022
ecfea03
added c4r exome counts to vcfanno
NourHanafi Aug 4, 2022
647877a
Merge pull request #153 from ccmbioinfo/mcouse/crg2-hg38-SV
NourHanafi Aug 4, 2022
4c538f9
Merge branch 'crg2-hg38' into master_copy
NourHanafi Aug 4, 2022
953ed41
Merge pull request #156 from ccmbioinfo/master_copy
NourHanafi Aug 4, 2022
4200c96
changed qualimap exon bed file to hg38 one with placeholders
NourHanafi Aug 5, 2022
c1cdee1
corrected MSSNG file paths
NourHanafi Aug 7, 2022
5314a51
made corrections to dbnsfp annotation specifications
NourHanafi Aug 7, 2022
f575d48
removed config.yaml from main directory
NourHanafi Aug 9, 2022
cf718db
created reference specific directories to put the different configs in
NourHanafi Aug 9, 2022
c22c3b0
changed cre database directory path
NourHanafi Aug 10, 2022
a8df958
changed cre database directory path
NourHanafi Aug 10, 2022
7619de8
created and linked new vcfanno configs for GRCh37 and hg38 separately
NourHanafi Aug 17, 2022
86cf62f
remove mistakenly named crg vcfanno config
NourHanafi Aug 17, 2022
47aae60
changed MSSNG file names back so that the SV workflow is compatible w…
NourHanafi Aug 17, 2022
1f4b115
modified decoy_rm for GRCh37 compatibility
NourHanafi Aug 17, 2022
5d50e72
enable toggling between GRCh37 and GRCh38 for annotSV
NourHanafi Aug 24, 2022
3c5d678
restored chr names to uppercase to enable grouping
NourHanafi Aug 24, 2022
dbb87d1
changed the decoy labels to match new hg38 ones
NourHanafi Jan 14, 2022
06361d8
fixed a typo
NourHanafi Jan 14, 2022
53e8ac8
doubled number of threads for bwa
NourHanafi Jan 14, 2022
0305259
switched vcfanno files/annotations to hg38
NourHanafi Jan 19, 2022
ddcb493
updated dbnsfp columns
NourHanafi Feb 28, 2022
9ae6ff6
changed vcfanno operations from first to self
NourHanafi Mar 10, 2022
30e074c
add parser for exome processing
Madelinehazel Jan 24, 2022
3a6c71b
search upload and result dirs with improperly formatted names
Madelinehazel Jan 25, 2022
2a5e057
add peddy wrapper, still in progress
Jan 26, 2022
25e14ab
fixed peddy bug
Jan 31, 2022
4d76d4c
changed memory usage for peddy
Feb 3, 2022
3b3b46f
search ccmbio_ephemeral for inputs
Madelinehazel Feb 4, 2022
21f193c
added a sort functionality so that fastqs will be concatenated in order
NourHanafi Feb 7, 2022
5d784c7
made fastq.gz wildcard more flexible to accept alternatively named fa…
NourHanafi Feb 8, 2022
7fba876
add multiqc config file and fix scripts
pamelaxu213 Feb 11, 2022
234f02b
switch to original exon bed for svreport
pamelaxu213 Feb 11, 2022
f16207a
made the fastq globbing more flexible so that it also recognizes fast…
NourHanafi Feb 16, 2022
daf3c0f
made fq1=/=fq2 error message easier to understand
NourHanafi Feb 16, 2022
a02a549
fixed an issue with the fastq globbing
NourHanafi Feb 16, 2022
272cbec
if family analysis directory exists, do not overwrite
Madelinehazel Feb 24, 2022
4c7671b
uncomment submit
Madelinehazel Feb 24, 2022
5492e6b
useful scripts for uploading/moving data and making bam slices
Madelinehazel Feb 25, 2022
3cb9c17
changed the paths of some annotation databases to the hg38 versions
NourHanafi Mar 22, 2022
93a2a44
changed vest3 to vest4 (most updated version)
NourHanafi Mar 30, 2022
6906304
upgraded gatk to v.4.2
NourHanafi May 25, 2022
52a874e
added dragen-gatk as an option to run vs. gatk or gatk3
NourHanafi May 25, 2022
91b6327
commented out dragstr for testing with dragen-mode only
NourHanafi May 28, 2022
d3f72ef
added in hg38 C4R genome counts
NourHanafi May 28, 2022
de2cddd
added dragen calling to exome pipeline
NourHanafi May 28, 2022
1576661
added hg38 snpeff data directory path
NourHanafi Jun 10, 2022
b8b552b
set gene names in Biomart file to string explicitly
NourHanafi Jun 17, 2022
bc927f6
added hg38 svscore files
NourHanafi Jun 17, 2022
a387151
added print statements for troubleshooting + specified hg38 for AnnotSV
NourHanafi Jul 6, 2022
66eb799
SV report fixes
Madelinehazel Jul 20, 2022
6d18abf
added hg38 SV annotation files and changed MSSNG file names
NourHanafi Aug 3, 2022
d199376
changed hg38 reference from Illumina to GIAB one
NourHanafi Aug 4, 2022
f337f3b
removed dragen calling from the exome workflow
NourHanafi Aug 4, 2022
c6ca5e0
added c4r exome counts to vcfanno
NourHanafi Aug 4, 2022
77dcd64
changed qualimap exon bed file to hg38 one with placeholders
NourHanafi Aug 5, 2022
3a1fdf4
corrected MSSNG file paths
NourHanafi Aug 7, 2022
f29e06a
made corrections to dbnsfp annotation specifications
NourHanafi Aug 7, 2022
3bea102
created reference specific directories to put the different configs in
NourHanafi Aug 9, 2022
03b5c92
changed cre database directory path
NourHanafi Aug 10, 2022
de4460d
changed cre database directory path
NourHanafi Aug 10, 2022
2ff6c0f
created and linked new vcfanno configs for GRCh37 and hg38 separately
NourHanafi Aug 17, 2022
9d60a89
remove mistakenly named crg vcfanno config
NourHanafi Aug 17, 2022
f5c8bf3
changed MSSNG file names back so that the SV workflow is compatible w…
NourHanafi Aug 17, 2022
5de9cfc
modified decoy_rm for GRCh37 compatibility
NourHanafi Aug 17, 2022
35bf5eb
enable toggling between GRCh37 and GRCh38 for annotSV
NourHanafi Aug 24, 2022
8734690
restored chr names to uppercase to enable grouping
NourHanafi Aug 24, 2022
7331e7f
updates for compatibility with slurm
Madelinehazel Jan 4, 2023
c1f6b8c
fix merge conflicts
Madelinehazel Jan 4, 2023
f18c1bd
add mity paths but remove report from Snakefile as there are still er…
Madelinehazel Jan 17, 2023
293d530
rename config
Madelinehazel Jan 17, 2023
19977f5
hg38 hpo panel report minor change
pamelaxu213 Dec 1, 2023
da56a3d
update README
Madelinehazel Feb 13, 2024
28241d3
update benchmark filename
Madelinehazel Feb 13, 2024
c8b74e8
update filepath
Madelinehazel Feb 26, 2024
ac68e55
sort platypus VCF
Madelinehazel Feb 27, 2024
ff7240a
update verifybamid2 version to avoid this error:
Madelinehazel Mar 7, 2024
4318d31
Added LinSight Score to crg.vcfanno_hg38
Mar 10, 2024
692eb52
Added a commented out target to dnaseq_slurm_hpf.sh
Mar 10, 2024
0bf9677
added some comments with dry run
Mar 17, 2024
de035bb
Revert dnaseq_slurm_hpf.sh changes
Mar 25, 2024
ea27f27
Adding FATHMM-XF annotations to vcfanno config
Mar 26, 2024
fa90e59
Merge pull request #211 from ccmbioinfo/rvaran/FATHMM-XF
Madelinehazel Apr 10, 2024
fcfe219
Merge pull request #209 from ccmbioinfo/add_linsight_score
Madelinehazel Apr 10, 2024
2b076aa
added ncER score annotation to vcfanno
Apr 18, 2024
fec9c8c
Update crg.vcfanno_hg38.conf
Madelinehazel Jun 25, 2024
d8c4756
Merge pull request #213 from ccmbioinfo/iabbasi/ncer
Madelinehazel Jun 25, 2024
49f1904
Update LINSIGHT in crg.vcfanno_hg38.conf
Madelinehazel Jun 25, 2024
ad0549b
Update crg.vcfanno_hg38.conf: ReMM
Madelinehazel Jun 25, 2024
b15bf9f
changed to CADD1.7
pamelaxu213 Jul 5, 2024
691bc89
Add Alphamissense annotation
anjalijain22 Jun 25, 2024
e0d104c
Update alphamissense filename
anjalijain22 Jul 5, 2024
ba6a78e
Update Alphamissense filename for cre vcfanno config file
anjalijain22 Jul 5, 2024
4d62a47
add alpha missense
pamelaxu213 Jul 8, 2024
635782f
get cram from iRods archive
Madelinehazel Jul 17, 2024
92b6f2c
add GreenDB annotation
pamelaxu213 Jul 18, 2024
68226f4
irods for BAM
anjalijain22 Jul 19, 2024
1305b2c
Merge pull request #227 from ccmbioinfo/mcouse/irods-hg38
Madelinehazel Jul 19, 2024
c899de4
Merge pull request #228 from ccmbioinfo/greenDB
pamelaxu213 Jul 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 8 additions & 14 deletions config_hpf.yaml → GRCh37/config_hpf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ run:
hpo: "" # five-column TSV with HPO terms; leave this string empty is there are no hpo terms
flank: 100000
gatk: "gatk"
pipeline: "wes" #either wes (exomes) or wgs (genomes) or annot (to annotate and produce reports for an input vcf) or mity (to generate mitochondrial reports)
pipeline: "wes" #either wes (exomes) or wgs (genomes) or annot (to annotate and produce reports for an input vcf)
minio: ""
PT_credentials: ""

Expand Down Expand Up @@ -82,11 +82,11 @@ qc:

annotation:
cre.vcfanno:
conf: "~/crg2/vcfanno/cre.vcfanno.conf"
conf: "~/crg2/vcfanno/cre.vcfanno_GRCh37.conf"
lua_script: "~/crg2/vcfanno/cre.vcfanno.lua"
base_path: "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/variation/"
base_path: "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37"
vcfanno:
conf: "~/crg2/vcfanno/crg.vcfanno.conf"
conf: "~/crg2/vcfanno/crg.vcfanno_GRCh37.conf"
lua_script: "~/crg2/vcfanno/crg.vcfanno.lua"
base_path: "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/variation/"
mt.vcfanno:
Expand All @@ -102,6 +102,7 @@ annotation:
intron_bed: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/SVScore/refGene.introns.bed.gz"
cadd: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/SVScore/whole_genome_SNVs.tsv.gz"
svreport:
reference: GRCh37
hgmd: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/HGMD_2018/hgmd_pro.db"
protein_coding_genes: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/grch37.p13.ensembl.sorted.protein.coding.genes.bed"
exon_bed: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/exons/hg19_UCSC_exons_canonical.bed"
Expand All @@ -121,7 +122,7 @@ annotation:
database_path: "/hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/variation/"

validation:
benchmark: "benchmark_hpf.tsv"
benchmark: "benchmark.tsv"

params:
bwa:
Expand All @@ -135,9 +136,6 @@ params:
#BaseRecalibrator: "--interval-set-rule INTERSECTION -U LENIENT-VCF-PROCESSING --read-filter BadCigar --read-filter NotzPrimaryAlignment"
GenotypeGVCFs: ""
VariantRecalibrator: ""
Mutect2:
gnomad_germline: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/annotation/Mutect2/af-only-gnomad.raw.sites.vcf"
FilterMutectCalls: ""
gatk3:
java_opts: "-Xms500m -Xmx9555m"
HaplotypeCaller: "-drf DuplicateRead --interval_set_rule INTERSECTION --pair_hmm_implementation VECTOR_LOGLESS_CACHING -ploidy 2 -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment"
Expand Down Expand Up @@ -173,14 +171,10 @@ params:
bcftools:
mpileup: "-a DP -a AD "
call: " -m -v "
# -F x sets the output filter to PASS if any of the variant filters is PASS in sample VCFs to be merged
merge: "-F x"
freebayes:
call: " --genotype-qualities --strict-vcf --ploidy 2 --no-partial-observations --min-repeat-entropy 1 "
freebayes: " --genotype-qualities --strict-vcf --ploidy 2 --no-partial-observations --min-repeat-entropy 1 "
platypus: "--filterDuplicates=0"
rtg-tools:
java_opts: "-Xmx20g"
vcfeval:
sdf: "/hpf/largeprojects/ccm_dccforge/dccdipg/Common/rtg-tools/GRch37_SDF/"
vcfsubset:
java_opts: "-Xmx2048m"

Loading