Skip to content

Commit

Permalink
Merge pull request #50 from eastgenomics/DI-623_3.0.0
Browse files Browse the repository at this point in the history
DI-623 3.0.0 (#50)
  • Loading branch information
jethror1 authored Dec 18, 2023
2 parents 3afc0a1 + 7bc23d7 commit 2959e1a
Show file tree
Hide file tree
Showing 2 changed files with 224 additions and 14 deletions.
46 changes: 32 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,35 @@
# egg5_dias_CEN_config
> [!CAUTION]
> This repo contains a JSON file (compatible with new eggd_dias_batch - planned to go live in Jan 2024) and a Python script (compatible with old dias_batch - currently being used)
>
> The readme.md describes the JSON not the python script
>
> When new eggd_dias_batch goes live the python script will be deleted
This repo contains a Python config file which is used with dias_batch_running to specify inputs for running the Dias pipeline for CEN data.
# dias_CEN_config_GRCh37_v3.0.0.json

This repo contains a JSON config file which is used with eggd_dias_batch to specify inputs for running the Dias pipeline for CEN data.

## What does the config do?
dias_batch_running ([https://github.com/eastgenomics/dias_batch_running](https://github.com/eastgenomics/dias_batch_running)) is a Python module that runs the Dias pipeline for germline sequence data analysis on DNAnexus. The egg5_dias_CEN_config specifies the executables and their input files to be used in the Dias pipeline for analysing CEN data.
eggd_dias_batch ([https://github.com/eastgenomics/dias_batch_running](https://github.com/eastgenomics/dias_batch_running)) is a DNAnexus app that runs the Dias pipeline for germline sequence data analysis. The egg5_dias_CEN_config repo contains the dias_CEN_config file that specifies the executables and their input files to be used in the Dias pipeline for analysing CEN data on build GRCh37.

New versions of apps and app inputs for use in the Dias pipeline can be updated in the config without needing to update the pipeline itself.

## Parts of the config
* GATKgCNV_call
* specifies the app ID and inputs for CNV calling.
* dias_reports
* specifies the workflow ID, stage IDs (matching those in the workflow), and dynamic files for dias_reports.
* dias_cnvreports
* specifies the workflow ID, stage IDs (matching those in the workflow), and dynamic files for dias_cnvreports.

## Versions of workflows and dynamic files in the config
The config specifies app IDs and workflow IDs at the top, followed by a `reference_files` dict for inputs common to multiple running modes.
The `modes` section specifies inputs specific to a running mode:
* cnv_call
* specifies inputs for CNV calling.
* snv_reports
* specifies inputs for dias_reports.
* cnv_reports
* specifies inputs for dias_cnvreports.
* mosaic_reports
* specifies inputs for dias_reports for mosaic reports.
* artemis
* specifies inputs for [artemis](https://github.com/eastgenomics/eggd_artemis)


## Versions of workflows, apps, and dynamic files in the config
Workflows:
* Dias reports: **dias_reports_v2.1.0**
* DNAnexus workflow ID: `workflow-GXzkfYj4QPQp9z4Jz4BF09y6`
Expand All @@ -26,17 +40,21 @@ Apps:
* CNV calling app: **eggd_GATKgCNV_call**
* v1.0.2
* DNAnexus app ID: `app-GZ4pXxj4xG062Bj5zjgP1Bb0`
* Artemis app: **eggd_artemis**
* v1.3.0
* DNAnexus app ID `app-GZ1X5zj4K5ZxyfYPPq4YgGv3`

Dynamic files:
| File | File name | DNAnexus file ID |
| --------- | --------- | ---------------- |
| genepanels | **230602_genepanels.tsv** | `file-GVx0vkQ433Gvq63k1Kj4Y562` |
| genes2transcripts | **230421_g2t.tsv** | `file-GV4P970433Gj6812zGVBZvB4` |
| exons_nirvana | **GCF_000001405.25_GRCh37.p13_genomic.exon_5bp_v2.0.0.tsv** | `file-GF611Z8433Gk7gZ47gypK7ZZ` |
| genes2transcripts | **230421_g2t.tsv** | `file-GV4P970433Gj6812zGVBZvB4` |
| exons_file for eggd_athena | **GCF_000001405.25_GRCh37.p13_genomic.symbols.exon_5bp_v2.0.0.tsv** | `file-GF611Z8433Gf99pBPbJkV7bq` |
| cen_vep_config for SNV reports | **cen_vep_config_v1.1.9.json** | `file-GbPYpkQ4z6jBQxkqYBF32821` |
| cen_vep_config for SNV/mosaic reports | **cen_vep_config_v1.1.9.json** | `file-GbPYpkQ4z6jBQxkqYBF32821` |
| cen_vep_config for CNV reports | **cen-cnv_config_v1.1.0.json** | `file-GQGJ3Z84xyx0jp1q65K1Q1jY` |
| additional_regions for CNVs | **CEN_CNV_additional_regions_b37_v1.0.1.tsv** | `file-GJZQvg0433GkyFZg13K6VV6p` |
| gatk_docker | **GATK_v4.2.5.0.tar.gz** | `file-GBBP9JQ433GxV97xBpQkzYZx` |
| interval_list for CNV calling | **CEN_CNV_targets_v1.1.0_sorted.interval_list** | `file-GFPxzKj4V50pJX3F4vV58yyg` |
| annotation of interval_list for CNV calling | **CEN_CNV_targets_v1.1.0_sorted_annotation.tsv**| `file-GFPxzPQ4V50z4pv230p82G0q` |
| annotation of interval_list for CNV calling | **CEN_CNV_targets_v1.1.0_sorted_annotation.tsv**| `file-GFPxzPQ4V50z4pv230p82G0q` |
| capture_bed for artemis | **CEN_CNV_targets_b37_v1.1.0.bed** | `file-GFPxpJj4GVV0Pfzv4VGYf1pq` |
192 changes: 192 additions & 0 deletions dias_CEN_config_GRCh37_v3.0.0.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
{
"assay": "CEN",
"version": "3.0.0",
"cnv_call_app_id": "app-GZ4pXxj4xG062Bj5zjgP1Bb0",
"artemis_app_id": "app-GZ1X5zj4K5ZxyfYPPq4YgGv3",
"snv_report_workflow_id": "workflow-GXzkfYj4QPQp9z4Jz4BF09y6",
"cnv_report_workflow_id": "workflow-GXzvJq84XZB1fJk9fBfG88XJ",
"reference_files": {
"genepanels": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GVx0vkQ433Gvq63k1Kj4Y562",
"exons_nirvana": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GF611Z8433Gk7gZ47gypK7ZZ",
"genes2transcripts": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GV4P970433Gj6812zGVBZvB4",
"exonsfile": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GF611Z8433Gf99pBPbJkV7bq"
},
"name_patterns": {
"Epic": "^[\\d\\w]+-[\\d\\w]+",
"Gemini": "^X[\\d]+"
},
"modes": {
"cnv_call": {
"instance_type": "mem2_ssd1_v2_x16",
"inputs": {
"bambais": {
"folder": "/sentieon-dnaseq/",
"name": ".bam$|.bam.bai$"
},
"GATK_docker": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GBBP9JQ433GxV97xBpQkzYZx"
}
},
"annotation_tsv": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GFPxzPQ4V50z4pv230p82G0q"
}
},
"interval_list": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GFPxzKj4V50pJX3F4vV58yyg"
}
}
}
},
"cnv_reports": {
"stage_instance_types": {
},
"inputs": {
"stage-cnv_generate_bed_vep.exons_nirvana": "INPUT-exons_nirvana",
"stage-cnv_generate_bed_vep.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-cnv_generate_bed_vep.gene_panels": "INPUT-genepanels",
"stage-cnv_generate_bed_vep.flank": 495,
"stage-cnv_generate_bed_vep.additional_regions": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GJZQvg0433GkyFZg13K6VV6p"
}
},
"stage-cnv_vep.config_file": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GQGJ3Z84xyx0jp1q65K1Q1jY"
}
},
"stage-cnv_vep.vcf": {
"folder": "CNV_vcfs",
"name": "_segments.vcf$"
},
"stage-cnv_generate_bed_excluded.exons_nirvana": "INPUT-exons_nirvana",
"stage-cnv_generate_bed_excluded.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-cnv_generate_bed_excluded.gene_panels": "INPUT-genepanels",
"stage-cnv_generate_bed_excluded.additional_regions": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GJZQvg0433GkyFZg13K6VV6p"
}
},
"stage-cnv_generate_bed_excluded.flank": 0,
"stage-cnv_annotate_excluded_regions.cds_hgnc": "INPUT-exons_nirvana",
"stage-cnv_annotate_excluded_regions.cds_gene": "INPUT-exonsfile",
"stage-cnv_annotate_excluded_regions.additional_regions": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GJZQvg0433GkyFZg13K6VV6p"
}
},
"stage-cnv_annotate_excluded_regions.excluded_regions": {
"folder": "CNV_summary",
"name": "_excluded_intervals.bed$"
},
"stage-cnv_generate_workbook.additional_sheet_names": "ExcludedRegions",
"stage-cnv_generate_workbook.exclude_columns": "REF FILTER CSQ_Allele CSQ_Consequence CSQ_IMPACT",
"stage-cnv_generate_workbook.acmg": true,
"stage-cnv_generate_workbook.reorder_columns": "CHROM POS END CNVLEN ID ALT QUAL CSQ_SYMBOL CSQ_Feature CSQ_VARIANT_CLASS CSQ_EXON CSQ_INTRON CSQ_STRAND GT CN NP QA QS QSE QSS",
"stage-cnv_generate_workbook.add_comment_column": true,
"stage-cnv_generate_workbook.summary": "dias"
}
},
"snv_reports": {
"stage_instance_types": {
},
"inputs": {
"stage-rpt_generate_bed_athena.exons_nirvana": "INPUT-exons_nirvana",
"stage-rpt_generate_bed_athena.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-rpt_generate_bed_athena.gene_panels": "INPUT-genepanels",
"stage-rpt_generate_bed_vep.exons_nirvana": "INPUT-exons_nirvana",
"stage-rpt_generate_bed_vep.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-rpt_generate_bed_vep.gene_panels": "INPUT-genepanels",
"stage-rpt_generate_bed_vep.flank": 495,
"stage-rpt_vep.config_file": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GbPYpkQ4z6jBQxkqYBF32821"
}
},
"stage-rpt_vep.vcf": {
"folder": "sentieon-dnaseq",
"name": "^[^\\.]*(?!\\.g)\\.vcf(\\.gz)?$"
},
"stage-rpt_generate_workbook.exclude_columns": "BaseQRankSum ClippingRankSum DB ExcessHet FS MLEAC MLEAF MQ MQRankSum QD ReadPosRankSum SOR PL QUAL ID FILTER CSQ_ClinVar_CLNSIGCONF CSQ_Allele CSQ_HGNC_ID DP AC AF AN CSQ_SpliceAI_pred_DP_AL CSQ_SpliceAI_pred_DP_AG CSQ_SpliceAI_pred_DP_DG CSQ_SpliceAI_pred_DP_DL CSQ_gnomADe_AC_popmax CSQ_gnomADe_AF_popmax CSQ_gnomADe_AN_popmax CSQ_gnomADe_nhomalt_popmax CSQ_gnomADe_non_cancer_AC CSQ_gnomADe_non_cancer_AC_popmax CSQ_gnomADe_non_cancer_AF CSQ_gnomADe_non_cancer_AF_popmax CSQ_gnomADe_non_cancer_AN CSQ_gnomADe_non_cancer_AN_popmax CSQ_gnomADe_non_cancer_nhomalt CSQ_gnomADe_non_cancer_nhomalt_popmax CSQ_gnomADe_non_cancer_popmax CSQ_gnomADe_popmax CSQ_gnomADg_AC_popmax CSQ_gnomADg_AF_popmax CSQ_gnomADg_AN_popmax CSQ_gnomADg_nhomalt_popmax CSQ_gnomADg_popmax",
"stage-rpt_generate_workbook.acmg": true,
"stage-rpt_generate_workbook.rename_columns": "CSQ_Feature=Transcript DP_FMT=DP",
"stage-rpt_generate_workbook.add_comment_column": true,
"stage-rpt_generate_workbook.keep_tmp": true,
"stage-rpt_generate_workbook.summary": "dias",
"stage-rpt_generate_workbook.filter": "bcftools filter -e '(CSQ_Consequence==\"synonymous_variant\" | CSQ_Consequence==\"intron_variant\" | CSQ_Consequence==\"upstream_gene_variant\" | CSQ_Consequence==\"downstream_gene_variant\" | CSQ_Consequence==\"intergenic_variant\" | CSQ_Consequence==\"5_prime_UTR_variant\" | CSQ_Consequence==\"3_prime_UTR_variant\" | CSQ_gnomADe_AF>0.01 | CSQ_gnomADg_AF>0.01 | CSQ_TWE_AF>0.05) & CSQ_HGMD_CLASS!~ \"DM\" & CSQ_ClinVar_CLNSIG!~ \"pathogenic\\/i\" & CSQ_ClinVar_CLNSIGCONF!~ \"pathogenic\\/i\"'",
"stage-rpt_generate_workbook.human_filter": "excluded gnomAD exomes / genomes > 1%, TWE > 5%, synonymous / intronic / intergenic / upstream / downstream / UTRs EXCEPT pathogenic status in ClinVar OR DM in HGMD Class",
"stage-rpt_generate_workbook.reorder_columns": "CHROM POS REF ALT GT GQ DP_FMT AD CSQ_SYMBOL CSQ_EXON CSQ_INTRON CSQ_HGVSc CSQ_HGVSp CSQ_Consequence CSQ_IMPACT CSQ_VARIANT_CLASS CSQ_gnomADe_AF CSQ_gnomADe_Hom CSQ_gnomADe_AC CSQ_gnomADe_AN CSQ_gnomADe_nhomalt CSQ_gnomADg_AF CSQ_gnomADg_AC CSQ_gnomADg_AN CSQ_gnomADg_nhomalt CSQ_TWE_AF CSQ_TWE_AC_Hom CSQ_TWE_AC_Het CSQ_TWE_AN CSQ_HGMD CSQ_HGMD_CLASS CSQ_HGMD_RANKSCORE CSQ_HGMD_PHEN CSQ_Existing_variation CSQ_ClinVar CSQ_ClinVar_CLNDN CSQ_ClinVar_CLNSIG CSQ_Mastermind_MMID3 CSQ_CADD_PHRED CSQ_REVEL CSQ_SpliceAI_pred_DS_AG CSQ_SpliceAI_pred_DS_AL CSQ_SpliceAI_pred_DS_DG CSQ_SpliceAI_pred_DS_DL CSQ_HGVS_OFFSET CSQ_STRAND CSQ_Feature",
"stage-rpt_generate_workbook.freeze_column": "N2",
"stage-rpt_athena.exons_file": "INPUT-exonsfile",
"stage-rpt_athena.limit": 260,
"stage-rpt_athena.summary": true,
"stage-rpt_athena.mosdepth_files": {
"folder": "eggd_mosdepth",
"name": "per-base.bed.gz$|reference_build.txt$"
}
}
},
"mosaic_reports": {
"stage_instance_types": {
},
"inputs": {
"stage-rpt_generate_bed_athena.exons_nirvana": "INPUT-exons_nirvana",
"stage-rpt_generate_bed_athena.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-rpt_generate_bed_athena.gene_panels": "INPUT-genepanels",
"stage-rpt_generate_bed_vep.exons_nirvana": "INPUT-exons_nirvana",
"stage-rpt_generate_bed_vep.nirvana_genes2transcripts": "INPUT-genes2transcripts",
"stage-rpt_generate_bed_vep.gene_panels": "INPUT-genepanels",
"stage-rpt_vep.config_file": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GbPYpkQ4z6jBQxkqYBF32821"
}
},
"stage-rpt_vep.vcf": {
"folder": "tnhaplotyper",
"name": "^[^\\.]*(?!\\.g)\\.vcf(\\.gz)?$"
},
"stage-rpt_generate_workbook.exclude_columns": "BaseQRankSum ClippingRankSum DB ExcessHet FS MLEAC MLEAF MQ MQRankSum QD ReadPosRankSum SOR PL QUAL ID FILTER CSQ_ClinVar_CLNSIGCONF CSQ_Allele CSQ_HGNC_ID DP AC AF AN CSQ_SpliceAI_pred_DP_AL CSQ_SpliceAI_pred_DP_AG CSQ_SpliceAI_pred_DP_DG CSQ_SpliceAI_pred_DP_DL AS_FilterStatus AS_SB_TABLE ECNT GERMQ MBQ MFRL MMQ MPOS POPAF ROQ TLOD F1R2 F2R1 SB CSQ_gnomADe_AC_popmax CSQ_gnomADe_AF_popmax CSQ_gnomADe_AN_popmax CSQ_gnomADe_nhomalt CSQ_gnomADe_nhomalt_popmax CSQ_gnomADe_non_cancer_AC CSQ_gnomADe_non_cancer_AC_popmax CSQ_gnomADe_non_cancer_AF CSQ_gnomADe_non_cancer_AF_popmax CSQ_gnomADe_non_cancer_AN CSQ_gnomADe_non_cancer_AN_popmax CSQ_gnomADe_non_cancer_nhomalt CSQ_gnomADe_non_cancer_nhomalt_popmax CSQ_gnomADe_non_cancer_popmax CSQ_gnomADe_popmax CSQ_gnomADg_AC_popmax CSQ_gnomADg_AF_popmax CSQ_gnomADg_AN_popmax CSQ_gnomADg_nhomalt CSQ_gnomADg_nhomalt_popmax CSQ_gnomADg_popmax",
"stage-rpt_generate_workbook.acmg": true,
"stage-rpt_generate_workbook.rename_columns": "CSQ_Feature=Transcript DP_FMT=DP",
"stage-rpt_generate_workbook.add_comment_column": true,
"stage-rpt_generate_workbook.keep_tmp": true,
"stage-rpt_generate_workbook.summary": "dias",
"stage-rpt_generate_workbook.filter": "bcftools filter -e '(CSQ_Consequence==\"synonymous_variant\" | CSQ_Consequence==\"intron_variant\" | CSQ_Consequence==\"upstream_gene_variant\" | CSQ_Consequence==\"downstream_gene_variant\" | CSQ_Consequence==\"intergenic_variant\" | CSQ_Consequence==\"5_prime_UTR_variant\" | CSQ_Consequence==\"3_prime_UTR_variant\" | CSQ_gnomADe_AF>0.01 | CSQ_gnomADg_AF>0.01 | CSQ_TWE_AF>0.05) & CSQ_HGMD_CLASS!~ \"DM\" & CSQ_ClinVar_CLNSIG!~ \"pathogenic\\/i\" & CSQ_ClinVar_CLNSIGCONF!~ \"pathogenic\\/i\"'",
"stage-rpt_generate_workbook.human_filter": "excluded gnomAD exomes / genomes > 1%, TWE > 5%, synonymous / intronic / intergenic / upstream / downstream / UTRs EXCEPT pathogenic status in ClinVar OR DM in HGMD Class",
"stage-rpt_generate_workbook.reorder_columns": "CHROM POS REF ALT GT GQ DP_FMT AD CSQ_SYMBOL CSQ_EXON CSQ_INTRON CSQ_HGVSc CSQ_HGVSp CSQ_Consequence CSQ_IMPACT CSQ_VARIANT_CLASS CSQ_gnomADe_AF CSQ_gnomADe_Hom CSQ_gnomADe_AC CSQ_gnomADe_AN CSQ_gnomADe_nhomalt CSQ_gnomADg_AF CSQ_gnomADg_AC CSQ_gnomADg_AN CSQ_gnomADg_nhomalt CSQ_TWE_AF CSQ_TWE_AC_Hom CSQ_TWE_AC_Het CSQ_TWE_AN CSQ_HGMD CSQ_HGMD_CLASS CSQ_HGMD_RANKSCORE CSQ_HGMD_PHEN CSQ_Existing_variation CSQ_ClinVar CSQ_ClinVar_CLNDN CSQ_ClinVar_CLNSIG CSQ_Mastermind_MMID3 CSQ_CADD_PHRED CSQ_REVEL CSQ_SpliceAI_pred_DS_AG CSQ_SpliceAI_pred_DS_AL CSQ_SpliceAI_pred_DS_DG CSQ_SpliceAI_pred_DS_DL CSQ_HGVS_OFFSET CSQ_STRAND CSQ_Feature",
"stage-rpt_generate_workbook.freeze_column": "N2",
"stage-rpt_athena.exons_file": "INPUT-exonsfile",
"stage-rpt_athena.limit": 260,
"stage-rpt_athena.summary": true,
"stage-rpt_athena.thresholds": "100, 250, 500, 1000, 1500",
"stage-rpt_athena.cutoff_threshold": 250,
"stage-rpt_athena.mosdepth_files": {
"folder": "eggd_mosdepth",
"name": "per-base.bed.gz$|reference_build.txt$"
}
}
},
"artemis": {
"inputs": {
"capture_bed": {
"$dnanexus_link": {
"project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
"id": "file-GFPxpJj4GVV0Pfzv4VGYf1pq"
}
}
}
}
}
}

0 comments on commit 2959e1a

Please sign in to comment.