A case and its individuals (or samples, in cancer track) can be uploaded into Scout using specially formatted .yaml config files. Config files contain information about the analysis, panels used, path to VCF and eventual alignment files and more. The following template illustrates the basic structure of the .yaml config file:
The format is Yaml.
Example configuration files are found here: <scout root dir>/scout/demo/643594.config.yaml
or on Github.
bam_file
, bam_path
and alignment_path
are redundant in internal usage. Future versions of Scout will only
support alignment_path
.
Below are available configuration parameters for a Scout case. Names marked with asterix (*) are mandatory.
- analysis_date(*) Datetime Time for analysis in datetime format. Defaults to time of uploading. Example
2016-10-12 14:00:46
. - cnv_report String Path to the CNV report file.
- coverage_qc_report String Path to static coverage and QC report file.
- cohorts List of strings Meta organising study participants or cases.
- collaborators List of strings List of collaborators.
- coverage_qc_report String Path to HTML file with coverage and QC report.
- default_gene_panels List of strings List of default gene panels. Variants from the genes in the gene panels specified will be shown when opening the case in scout.
- delivery_report String: Path to HTML delivery report.
- family(*) String Unique ID of the case.
- family_name String Optional name of the case.
- gene_fusion_report String Path to a static gene fusion report produced by Arriba containing only clinical fusions (a subset of all detected fusions).
- gene_fusion_report_research String Path to a static gene fusion report produced by Arriba containing all detected fusions.
- gene_panels List of strings List of gene panels. Specifies what panels the case has been run with.
- human_genome_build String Version of genome version used, 37 or 38. Defaults to 37.
- lims_id String Case ID in Lims
- madeline String Path to a madeline pedigree file in XML format.
- multiqc String Path to a multiqc report with arbitrary information.
- multiqc_rna String Path to a nf-core/rnafusion multiqc report with arbitrary information.
- omics_files List
- owner(*) String Institute who owns current case. Must refer to existing institute.
- peddy_check String Path to a peddy ped check file.
- peddy_ped String Path to a peddy ped file with an analysis of the pedigree based on variant information.
- peddy_sex String Path to a peddy ped sex check file.
- phenotype_terms List of strings List of phenotype terms.
- rank model version String Which rank model that was used when scoring the variants.
- rank_score_threshold Float Only include variants with a rank score above this threshold.
- RNAfusion_inspector String Path to HTML nf-core/rnafusion inspector report containing only clinical fusions (a subset of all detected fusions).
- RNAfusion_inspector_research String Path to HTML nf-core/rnafusion inspector report containing all detected fusions.
- RNAfusion_report String Path to HTML nf-core/rnafusion report containing only clinical fusions (a subset of all detected fusions).
- RNAfusion_report_research String Path to HTML nf-core/rnafusion report containing all detected fusions.
- rna_human_genome_build String Version of reference genome used for RNA components of build. "37" or "38", default "38".
- rna_delivery_report String: Path to HTML RNA delivery report.
- samples List List of samples included in the case:
- alignment_path String Path to BAM/CRAM file to view alignments.
- analysis_type String Specifies the analysis type for the sample. Options: {wgs, wes, panel, unknown, external}.
- bam_file String Path to BAM/CRAM file to view alignments WARNING: Soon to be deprecated, use alignment_path.
- bam_path String Path to BAM/CRAM file to view alignments WARNING: Soon to be deprecated, use alignment_path.
- capture_kit String Exome specifies the capture kit.
- chromograph_images List
- autozygous String Path to file.
- coverage String Path to file.
- upd_regions String Path to file.
- upd_sites String Path to file.
- confirmed_parent Bool True if parent confirmed.
- d4_path String Path to .d4 file. Required for Chanjo2 integration
- expected_coverage Int The level of expected coverage.
- father String/Int Sample ID for father or 0.
- is_sma Bool/None if SMA status determined - None if not done.
- is_sma_carrier Bool/None # True / False if SMA carriership determined - None if not done.
- mitodel String Path to mitodel file.
- mother String/Int Sample ID for mother or 0.
- msi Int Microsatellite instability [0-60].
- mt_bam String Path to the reduced mitochondrial BAM/CRAM alignment file.
- phenotype(*) String Specifies the affection status {affected, unaffected, unknown}.
- reviewer List Reference
- alignment String Path to BAM/CRAM file to view STR alignments
- alignment_index String Path to BAM/CRAM index file to view STR alignments
- vcf String Path to STR VCF file to view STR alignments
- catalog String Path or URL to REViewer catalog JSON file to view STR alignments
- reference String Path or URL for REViewer to reference sequence for the individual STR alignment
- rna_alignment_path String Path to RNA alignment file (BAM/CRAM)
- rna_coverage_bigwig String Path to coverage islands file generated
- omics_sample_id String Sample ID for RNA, as in outliers files
- rhocall_bed String Path to BED file to view alignments Reference.
- rhocall_wig String Path to WIG file to view alignments Reference.
- samlple_id(*) String Identifyer for a sample.
- sample_name: String Name of sample.
- sex (*): String One of: {male, female, unknown}. Sex of the sample in human readable format.
- smn1_cn Int Copynumber.
- smn2_cn Int Copynumber.
- smn2delta78_cn Int Copynumber.
- splice_junctions_bed String Path to indexed junctions .bed.gz file
- subject_id String Individual identifier - multiple samples could belong to the same individual
- tiddit_coverage_wig String Path to WIG file to view alignments Reference.
- tissue_type String Sample tissue origin i.e. blood, muscle.
- tmb Int Tumor mutational burden [0, 1000] (tumor case only).
- tumor_purity Float Purity of tumor sample [0.1, 1.0] (tumor case only).
- tumor_type String Type of tumor (tumor case only).
- upd_regions_bed String Path to BED file to view alignments Reference.
- upd_sites_bed String Path to BED file to view alignments Reference.
- vcf2cytosure String Path to CGH file to allow download per individual. Such SV files can be visualized using standard arrayCGH analysis tools. See vcf2cytosure.
- smn_tsv String Path to an SMN TSV file.
- synopsis String Synopsis of case.
- sv_rank_model_version String Rank model that was used when scoring the variants.
- track String Type of track: {"rare", "cancer"}. Default: "rare".
- sv_rank_model_version String SV rank model version used when scoring SV variants.
- vcf_cancer String Path to canver VCF file (tumor case only).
- vcf_cancer_research String Path to VCF file with all variants (tumor case only).
- vcf_snv String Path to SNV VCF file containing only clinical variants (a subset of all variants).
- vcf_snv_research String Path to VCF file with all variants.
- vcf_sv String Path to SV VCF file containing only clinical variants (a subset of all variants).
- vcf_sv_research String Path to VCF file with all SV variants.
Here is an example of a minimal load config:
---
owner: cust004
family: '1'
samples:
- analysis_type: wes
sample_id: NA12878
capture_kit: Agilent_SureSelectCRE.V1
father: 0
mother: 0
sample_name: NA12878
phenotype: affected
sex: male
expected_coverage: 30
vcf_snv: scout/demo/643594.clinical.vcf.gz