Skip to content

chaimeleon-eu/chaimeleon-etl-chup-prostate

Repository files navigation

chaimeleon-etl-chup-prostate

Deploy notes

  1. Clone the repository
git clone https://github.com/chaimeleon-eu/chaimeleon-etl-chup-prostate.git
  1. Initialize the submodule
git submodule init
git submodule update
  1. Download/copy xlsx and csv files on data folder (./data).
  2. Deploy datake database running below command:
make deploy_datalake
  1. Now, you can run ETL in two ways:

    5.1 Running the two dataflows at once:

    make etl_chup_prostate

    5.2 Or running dataflows in several process:

    make etl_chup_prostate_datalake
    make etl_chup_prostate_indexa
  2. Check everything is okay querying data on indexa database and/or seeing outputs from above commands.

  3. Stop and remove datalake container.

make down
  1. Retrieve xml files from outputs folder.

Software dependencies

  • Docker (tested version: 20.10.17, build 100c701).
  • docker-compose (tested version: 1.29.2, build 5becea4c).
  • make (tested version: GNU Make 4.2.1).