ORANGE summarizes the key outputs from all algorithms in the Hartwig suite into a single PDF and JSON file:
- The algo depends exclusively on config and data produced by the Hartwig platinum pipeline and hence can always be run as final step without any additional local data or config required.
- The algo intends to combine RNA and DNA data to present an integrated DNA/RNA analysis of a tumor sample.
- Everything that is labeled as a driver by any of the Hartwig algo's is displayed in the PDF along with the driver likelihood. This effectively means that everything reported by patient-reporter is present in the ORANGE pdf.
- An additional exhaustive WGS scan is performed for anything interesting that may be potentially relevant but not picked up as a driver. Details of what is considered interesting are described in below.
- A comprehensive range of QC measures and plots is displayed which provides in-depth details about the data quality of the tumor sample.
An example report based on the publicly available melanoma cell line COLO829 can be found here.
Note that neither this readme nor the report itself contains any documentation about the Hartwig algorithms and output. For questions in this area please refer to the specific algorithm documentation present on https://github.com/hartwigmedical/hmftools
The front page of the ORANGE report lists all high-level stats about the sample along with genome-wide visualisations of all mutations and SNV/Indel clonality. In addition to this front page, the following chapters are generated in the ORANGE report:
- Somatic Findings: What potentially relevant mutations have been found in the tumor specifically?
- Germline Findings: What potentially relevant mutations have been found in the germline DNA?
- Immunology: What can we tell about the immunogenicity of the tumor sample?
- Cohort Comparison: How do the various properties of this tumor compare to existing cancer cohorts?
- Clinical Evidence: What genomic evidence has been found in favor of, or against, specific treatments?
- Quality Control: Various stats and graphs regarding the quality of the data and interpretation thereof.
Argument | Description |
---|---|
disable_germline | If set, disables the germline findings chapter and transforms germline variants to somatic variants. |
max_evidence_level | If set, filters evidence down to this level. For example, if "B" is passed as a parameter, only treatments with at least A or B level evidence are displayed in the clinical evidence chapter of the report. Do note that the front page always lists the count of all evidence present, regardless of this filter setting. |
In addition to all somatic drivers (SNVs/Indels, copy numbers, structural variants and fusions) the following is considered potentially interesting and added to the report:
- Other potentially relevant variants
- Variants that are hotspots but not part of the reporting gene panel.
- Variants which have clinical evidence but are not part of the reporting gene panel.
- Coding variants that are not reported but are phased with variants that are reported.
- Variants that are considered relevant for tumor type classification according to Cuppa.
- Other regions with amps or autosomal losses:
- Any chromosomal band location with at least one gene lost or fully amplified or loss is considered potentially interesting.
- For a band with more than one gene amplified, the gene with the highest minimum copy number is picked.
- For a band with a loss that has no losses reported in this band already, a random gene is picked.
- A maximum of 10 additional gains (sorted by minimum copy number) and 10 additional losses are reported as potentially interesting.
- Any chromosomal band location with at least one gene lost or fully amplified or loss is considered potentially interesting.
- Other potentially relevant fusions:
- Any fusion that is not reported and has a reported type other than NONE is picked.
- Any fusion with clinical evidence is picked.
- A maximum of 10 additional fusions (randomly picked) are reported as potentially interesting.
- Other viral presence
- Any viral presence that is not otherwise reported is reported as potentially interesting.
In addition to all germline SNV/Indel tumor drivers determined by PURPLE, the following is added to the report:
- Other potentially relevant variants
- Any hotspots that are not configured to be reported.
- Any hotspots that are filtered based on quality.
The germline CN aberrations are determined by PURPLE and include aberrations such as klinefelter or trisomy X.
The immunology chapter is work-in-progress and will report on various immunology properties of the tumor sample.
The cohort comparison reports all the properties of a tumor sample that Cuppa considers for determining tumor type. The cohort comparison displays the prevalence of the tumor's properties with respect to the cohorts that Cuppa could potentially assign the sample to:
- Genomic position distribution of SNVs and their tri-nucleotide signature
- Sample traits of the tumor (for example, number of LINE insertions)
- (Driver) features of the tumor.
Do note that RNA features and cohort comparison thereof are only included if platinum was run in combined DNA/RNA mode.
The following algo is used to render clinical evidence in the ORANGE report based on PROTECT output:
- Evidence is split up based on applicable and "potentially interesting" based on PROTECT reported yes/no.
- Evidence is split between trials and non-trials which are further split up based on on/off label.
- Evidence is grouped by treatment and split up between responsive and resistance evidence.
- Evidence is filtered based on the optional
max_reporting_level
configuration.
The quality control chapter provides extensive details that can help with interpreting the overall PURPLE QC status or investigate potential causes for QC failure.
- The high-level QC from PURPLE
- Various details from the tumor and reference samples flagstats and coverage stats
- Various plots from PURPLE
- BQR plots from both reference and tumor sample from SAGE
- 1.6
- Transform germline variants to somatic in case germline is switched off (somatic findings + drivers on front page)
- Add "upstream" to variant details in case variant is upstream without annotation
- Add driver likelihood for viruses
- Generify trial sources to include any trial source that is labeled as trial by SERVE
- 1.5
- Support for PAVE
- Handle multiple drivers per gene where non-canonical transcripts are included. Current behaviour is to ignore non-canonical transcript drivers.
- Support for proper fusion table rendering in case fusions are wrapped over multiple pages.
- 1.4
- Fix a formatting problem in clinical evidence in case of very long genomic events
- Support display of variants with coding impact relative to the 3' UTR region of a gene.
- Fix another bug with percentile determination in the absence of a cancer type
- 1.3
- Fix bug for generating reports for samples without a known cancer type
- Fix bug when passing a Cuppa feature plot image that does not exist.
- 1.2
- Support for Virus Interpreter v1.1, including addition of % covered, mean coverage and expected clonal coverage
- The Cuppa best prediction is always displayed on the front page regardless of reliability of prediction.
- More details about HR deficiency are displayed on front page in case sample is HR deficient
- Pan-cancer and cancer-type specific percentiles for SV TMB are displayed on the front page
- Other autosomal regions with deletions are no longer filtered for germline events, so all autosomal deletions are now displayed regardless of whether they occurred in germline or not.
- Many technical and param changes described in linked release notes
- 1.1
- Add JSON output of comprehensive platinum output
- 1.0
- Initial release