- Clone the repository
git clone https://github.com/chaimeleon-eu/chaimeleon-etl-chup-prostate.git
- Initialize the submodule
git submodule init
git submodule update
- Download/copy xlsx and csv files on data folder (./data).
- Deploy datake database running below command:
make deploy_datalake
-
Now, you can run ETL in two ways:
5.1 Running the two dataflows at once:
make etl_chup_prostate
5.2 Or running dataflows in several process:
make etl_chup_prostate_datalake make etl_chup_prostate_indexa
-
Check everything is okay querying data on indexa database and/or seeing outputs from above commands.
-
Stop and remove datalake container.
make down
- Retrieve xml files from outputs folder.
- Docker (tested version: 20.10.17, build 100c701).
- docker-compose (tested version: 1.29.2, build 5becea4c).
- make (tested version: GNU Make 4.2.1).