This repo contains the following examples of using Cloud Composer, Google Cloud Platform's managed Apache Airflow service:
-
a. Simple Load DAG: provides a common pattern to automatically trigger, via Google Cloud Function, a Dataflow job when a file arrives in Google Cloud Storage, process the data and load it into BigQuery.
-
a. Ephemeral Dataproc Spark DAG: provides an example of triggering a DAG via HTTP POST to the Airflow API to create a Dataproc cluster, submit a Spark job, and import the newly enhanced GCS files into BigQuery.
Run this script to automate spin up / tear down of a lightweight airflow environment to run your tests.
./run_tests.sh