This repository contains a quickstart guide for setting up Galaxy and Airflow using Docker. It provides instructions to clone the repository, set up the necessary environment variables, build the Docker images, and start the services.
Before getting started, ensure you have the following:
- Docker installed on your machine
- Access to a Galaxy instance
- Cluster connection information (username, password, host name) from the Galaxy UI
git clone https://github.com/YCat33/galaxy_airflow_quickstart.git
cd galaxy_airflow_quickstart
- Navigate to your Galaxy Domain
- Leverage the Clusters page within the Galaxy UI to locate your connection variables.
- Run the below to allow the bash setup script to be executable:
chmod +x setup.sh
- Run the bash script below by leveraging the connection variables in step 4:
./setup.sh '<host>' '<user>' '<password>'
*This script performs the follwing steps
1. Runs the encode_special_chars script that sets the connection parameters to the necessary format (e.g. replacing "@' with "%40").
2. Runs the DockerFile to build the image, which involves installing the "apache-airflow-providers-trino" package and setting up the Galaxy Connection (These variables are used within the Docker-Compose.yaml file to instantiate a connection to Starburst Galaxy (see line 75 [here](https://github.com/YCat33/galaxy_airflow_quickstart/blob/31b28bbf9237b26cddbab380f416e80384e65cd3/docker-compose.yaml#L75)))
3. Deploys the necessary Docker containers based off the docker-compose file
- Navigate to
localhost:8080
in your Browser and login using "airflow" as the username and password.
- Task 1 (select_star) uses the TrinoOperator to execute a SQL select statement. It counts the number of records in the "tpch.tiny.customer" table and stores the result.
- Task 2 (print_number) is a PythonOperator that calls the print_command method. It retrieves the return value from Task 1 and prints it to the logs.
- Task 3 (data_validation_check) is an SQLColumnCheckOperator that performs a data quality check. It verifies that the distinct values in the "custkey" column of the "tpch.tiny.customer" table are equal to 1500.
- From the Airflow UI home screen, you should see a single DAG titled "starburst-galaxy-example".
- Trigger the DAG by clicking the “play” button on the right-hand side of the screen
- View the Logs for each task