Second generation of the ICGC DCC ETL build on Spark. For the first generation ETL project, please see the dcc-etl repository.
To build the application execute the following from the command line:
mvn clean package
For a high-level overview of the application please see PROCESS.md.
Sub-system modules:
- Stage Job
- Mask Job
- ID Job
- Image Job
- Annotate Job
- Join Job
- Import Job
- FATHMM Job
- Functional Impact Job
- Summarize Job
- Document Job
- Index Job
- Export Job
For information how to build a custom version of Spark please see SPARK.md.
For general instructions how to run a data processing with the dcc-release application please see RELEASE.md.
For DCC specific instructions please see internal documentation.
For information on FATHMM, please see FATHMM.md.