Library to convert DBT manifest metadata to Airflow tasks
Read the full documentation at https://dbt-airflow-factory.readthedocs.io/
Use the package manager pip to install the library:
pip install dbt-airflow-factory
The library is expected to be used inside an Airflow environment with a Kubernetes image referencing dbt.
dbt-airflow-factory's main task is to parse manifest.json
and create Airflow DAG out of it. It also reads config
files from config
directory and therefore is highly customizable (e.g., user can set path to manifest.json
).
To start, create a directory with a following structure, where manifest.json
is a file generated by dbt:
.
├── config
│ ├── base
│ │ ├── airflow.yml
│ │ ├── dbt.yml
│ │ └── k8s.yml
│ └── dev
│ └── dbt.yml
├── dag.py
└── manifest.json
Then, put the following code into dag.py
:
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
from os import path
dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), "dev").create()
When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse manifest.json
and prepare a DAG to run.
It is best to look up the example configuration files in tests directory to get a glimpse of correct configs.
You can use Airflow template variables in your dbt.yml
and k8s.yml
files, as long as they are inside
quotation marks:
target: "{{ var.value.env }}"
some_other_field: "{{ ds_nodash }}"
Analogously, you can use "{{ var.value.VARIABLE_NAME }}"
in airflow.yml
, but only the Airflow variable getter.
Any other Airflow template variables will not work in airflow.yml
.
DBT Airflow Factory works best in tandem with data-pipelines-cli tool. dp not only prepares directory for the library to digest, but also automates Docker image building and pushes generated directory to the cloud storage of your choice.