-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
publish to v0.0.1-alpha #7
Changes from 10 commits
fc9038e
1781330
dedfa0a
ef00a81
94abf9b
d7ec794
15b1d30
11f9a3e
ad52ed9
22ec890
3c5c848
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# Getting started guide | ||
|
||
The Prefect-dbt-flow library allows you to seamlessly integrate dbt workflows into Prefect. This usage guide will walk you through the steps required to create and manage a Prefect flow for your dbt project. | ||
|
||
## Example guide | ||
This guide will walk you through setting up and running a sample Prefect-dbt-flow using Docker Compose. Follow these steps to get started: | ||
|
||
### 1. Clone this repository | ||
Clone the Prefect-dbt-flow repository and navigate to the example directory. | ||
```bash | ||
git clone [email protected]:datarootsio/prefect-dbt-flow.git | ||
cd prefect-dbt-flow/example/jaffle_shop | ||
``` | ||
|
||
### 2. Install Docker Compose | ||
Ensure that you have Docker Compose installed on your system. If you haven't already installed it, refer to the [Docker Compose Installation Guide](https://docs.docker.com/compose/install/) for instructions. | ||
|
||
### 3. Start the Docker Container | ||
Start the Docker container by running the following command. This command will launch three services defined in the docker-compose file: | ||
- A PostgreSQL database, | ||
- A Prefect server accessible at: `http://0.0.0.0:4200/`, | ||
- A CLI environment with all the required components installed. | ||
```bash | ||
docker compose up -d | ||
``` | ||
|
||
### 4. Access the cli service | ||
To access the CLI service, use the following command: | ||
```bash | ||
docker compose run cli | ||
``` | ||
|
||
### 5. Run the Prefect flow | ||
Inside the CLI environment, | ||
|
||
run the following comand to seed the csv files: | ||
```bash | ||
dbt seed | ||
``` | ||
|
||
run the Prefect-dbt-flow using the following command: | ||
```bash | ||
python my_prefect_dbt_flow.py | ||
``` | ||
This command will execute the Prefect flow and print its status to the terminal. | ||
|
||
### 6. View the reseults | ||
To view the results and monitor the flow, follow these steps: | ||
|
||
- Open a web browser and go to `http://0.0.0.0:4200/`. | ||
- In the Prefect Server interface, click on the flow run. It should have a similar name to `adjective-animal`. | ||
- From there, you can explore the dbt job DAG and its associated logs. | ||
|
||
With these steps, you can set up and run a Prefect-dbt-flow and monitor its progress through the Prefect Server interface. | ||
|
||
# How does it works? | ||
|
||
## Installation | ||
Before using Prefect-dbt-flow, you need to install the library. You can do this using pip: | ||
```shell | ||
pip install prefect-dbt-flow | ||
``` | ||
You can install an specific version of **Prefect** if you need to: | ||
```shell | ||
pip install prefect==2.13.5 | ||
``` | ||
|
||
## Creating a Prefect Flow | ||
To get started, you'll need to create a Prefect flow that incorporates your dbt project. Here's a step-by-step guide: | ||
1. **Import the Required Modules:** | ||
Start by importing the necessary modules from prefect_dbt_flow: | ||
```python | ||
from prefect_dbt_flow import dbt_flow | ||
``` | ||
2. **Define the Prefect Flow:** | ||
Create a Prefect flow by initializing a `dbtFlow.dbt_flow` object. You can configure it with your dbt project, profile, and additional options: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just dbt_flow |
||
* **project**: A DbtProject object representing the dbt project configuration. | ||
* **profile**: A DbtProfile object representing the dbt profile configuration. | ||
* **dag_options**: A DbtDagOptions object to specify dbt DAG configurations. | ||
* **flow_kwargs**: A dictionary of Prefect flow arguments. | ||
Here's a basic example of how to use dbt_flow(): | ||
```python | ||
my_flow = dbtFlow.dbt_flow( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this code is not correct |
||
project=dbtFlow.DbtProject( | ||
name="my_flow", | ||
project_dir="path_to/dbt_project", | ||
profiles_dir="path_to/dbt_profiles", | ||
), | ||
profile=dbtFlow.DbtProfile( | ||
target="dev", | ||
), | ||
dag_options=dbtFlow.DbtDagOptions( | ||
run_test_after_model=True, | ||
), | ||
) | ||
``` | ||
With this basic setup, you have created a Prefect flow that manages your dbt project. When you run the script, Prefect will execute the dbt tasks defined in your project. | ||
3. **Run the Flow:** | ||
To execute the Prefect flow, add the following code block: | ||
```python | ||
if __name__ == "__main__": | ||
my_flow() | ||
``` | ||
4. **Start the prefect server** | ||
You will need to start prefect before the run | ||
```shell | ||
prefect server start | ||
``` | ||
You can check up the dashoard at `http://0.0.0.0:4200` | ||
5. **Running the Prefect Flow:** | ||
To run the Prefect flow, simply execute your Python script: | ||
```shell | ||
python my_prefect_dbt_flow.py | ||
``` | ||
Make sure you are in the correct directory or provide the full path to your script. Prefect will execute the dbt tasks defined in your flow, providing orchestration and monitoring capabilities. | ||
6. **See the run** | ||
You will be able to see the results of the run on the prefect dashboard at `http://0.0.0.0:4200` | ||
|
||
## Advanced Configuration | ||
In the previous section, you configured your dbt project within the Prefect flow. Here's how you can customize the configuration further: | ||
|
||
### Dbt Project Configuration: | ||
You specified the name, project directory, and profiles directory when creating the DbtProject object. Adjust these values to match your dbt project's setup. | ||
- `DbtProject`: Represents your dbt project configuration. | ||
- `name`: Name of the dbt project. | ||
- `project_dir`: Path to the directory containing the project.yml configuration file. | ||
- `profiles_dir`: Path to the directory containing the profiles.yml file. | ||
|
||
### Dbt Profile Configuration: | ||
The DbtProfile object allows you to set the target profile for your dbt project. This profile should match the configuration in your dbt profiles.yml file. | ||
- `DbtProfile`: Represents the dbt profile configuration. | ||
- `target`: Specify the dbt target (e.g., "dev" or "prod"). | ||
|
||
### Dag Options: | ||
The DbtDagOptions object lets you define various options for your dbt workflow. In the provided example, we set run_test_after_model to True, indicating that dbt tests should run after each dbt model. | ||
- `DbtDagOptions`: Allows you to specify dbt DAG configurations. | ||
- `select`: Specify a dbt module to include in the run. | ||
- `exclude`: Specify a dbt module to exclude in the run. | ||
- `run_test_after_model`: Set this to True to run tests after running models. | ||
|
||
### Prefect flow configuration | ||
Prefect-dbt-flow integrates with Prefect's monitoring and error handling capabilities. You can use Prefect features like scheduling, notifications, and task retries to monitor and manage your dbt flows effectively. You can pass this additional Prefect flow configuration options using a dictionary into: `flow_kwargs`. | ||
|
||
For more information on these features, consult the [Prefect documentation.](https://docs.prefect.io/2.10.12/api-ref/prefect/flows/#prefect.flows.flow) | ||
|
||
## Conclusion | ||
Prefect-dbt-flow simplifies the orchestration and management of dbt workflows within a Prefect flow. By following the steps in this guide, you can easily create and execute data pipelines that incorporate dbt projects. Be aware of breaking changes as this library is actively developed, and consult the changelog for updates. Happy data engineering! :rocket: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,81 @@ | ||
![dataroots.png](https://dataroots.io/assets/logo/logo-rainbow.png) | ||
[![maintained by dataroots](https://img.shields.io/badge/maintained%20by-dataroots-%2300b189)](https://dataroots.io) | ||
<p align="center"> | ||
<a href="https://datarootsio.github.io/prefect-dbt-flow"><img alt="logo" src="https://dataroots.io/assets/logo/logo-rainbow.png"></a> | ||
</p> | ||
<p align="center"> | ||
<a href="https://dataroots.io"><img alt="Maintained by dataroots" src="https://dataroots.io/maintained-rnd.svg" /></a> | ||
<a href="https://pypi.org/project/prefect-dbt-flow/"><img alt="Python versions" src="https://img.shields.io/pypi/pyversions/prefect-dbt-flow" /></a> | ||
<a href="https://pypi.org/project/prefect-dbt-flow/"><img alt="PiPy" src="https://img.shields.io/pypi/v/prefect-dbt-flow" /></a> | ||
<a href="https://pepy.tech/project/prefect-dbt-flow"><img alt="Downloads" src="https://pepy.tech/badge/prefect-dbt-flow" /></a> | ||
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg" /></a> | ||
<a href="http://mypy-lang.org/"><img alt="Mypy checked" src="https://img.shields.io/badge/mypy-checked-1f5082.svg" /></a> | ||
<!-- <a href="https://pepy.tech/project/prefect-dbt-flow"><img alt="Codecov" src="https://codecov.io/github/datarootsio/databooks/main/graph/badge.svg" /></a> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove this |
||
<a href="https://github.com/datarootsio/databooks/actions"><img alt="test" src="https://github.com/datarootsio/databooks/actions/workflows/test.yml/badge.svg" /></a> --> | ||
</p> | ||
|
||
# prefect-dbt-flow | ||
Welcome to the prefect-dbt-flow integration repository! This project aims to provide a seamless integration for simplifying the execution of dbt workflows using Prefect. | ||
Prefect-dbt-flow is a Python library that enables Prefect to convert dbt workflows into independent tasks within a Prefect flow. This integration simplifies the orchestration and execution of dbt models and tests using Prefect, allowing you to build robust data pipelines and monitor your dbt projects efficiently. | ||
|
||
## Requirements | ||
Before you get started, make sure you have the following prerequisites installed on your system: | ||
**Active Development Notice:** Prefect-dbt-flow is actively under development and may not be ready for production use. We advise users to be aware of potential breaking changes as the library evolves. Please check the changelog for updates. | ||
|
||
- python | ||
- prefect | ||
- dbt | ||
## Table of Contents | ||
- [Introduction](#introduction) | ||
- [Why Use Prefect-dbt-flow?](#why-use-prefect-dbt-flow) | ||
- [How to Install](#how-to-install) | ||
- [Basic Usage](#basic-usage) | ||
- [Inspiration](#inspiration) | ||
- [License](#license) | ||
|
||
## Installation | ||
``` bash | ||
pip install prefect-dbt-flow | ||
``` | ||
## Introduction | ||
Prefect-dbt-flow is a tool designed to streamline the integration of dbt workflows into Prefect. dbt is an immensely popular tool for building and testing data transformation models, and Prefect is a versatile workflow management system. This integration brings together the best of both worlds, empowering data engineers and analysts to create robust data pipelines. | ||
|
||
## Why Use Prefect-dbt-flow? | ||
### Simplified Orchestration | ||
With Prefect-dbt-flow, you can orchestrate your dbt workflows with ease. Define and manage your dbt projects and models as Prefect tasks, creating a seamless pipeline for data transformation. | ||
|
||
[Simplified Orchestration]() | ||
|
||
### Monitoring and Error Handling | ||
Prefect provides extensive monitoring capabilities and error handling. Now, you can gain deep insights into the execution of your dbt workflows and take immediate action in case of issues. | ||
|
||
## Usage | ||
[Monitoring and Error Handling]() | ||
|
||
### Create a flow | ||
### Workflow Consistency | ||
Ensure your dbt workflows run consistently by managing them through Prefect. This consistency is crucial for maintaining data quality and reliability. | ||
|
||
``` python | ||
TODO flow description | ||
[Workflow Consistency]() | ||
|
||
## How to Install | ||
You can install Prefect-dbt-flow via pip: | ||
```shell | ||
pip install prefect-dbt-flow | ||
``` | ||
## Basic Usage | ||
Here's an example of how to use Prefect-dbt-flow to create a Prefect flow for your dbt project: | ||
```python | ||
import prefect_dbt_flow as dbtFlow | ||
my_flow = dbtFlow.dbt_flow( | ||
project=dbtFlow.DbtProject( | ||
name="my_flow", | ||
project_dir="path_to/dbt_project", | ||
profiles_dir="path_to/dbt_profiles", | ||
), | ||
profile=dbtFlow.DbtProfile( | ||
target="dev", | ||
), | ||
dag_options=dbtFlow.DbtDagOptions( | ||
run_test_after_model=True, | ||
), | ||
) | ||
if __name__ == "__main__": | ||
my_flow() | ||
``` | ||
For more information consult the [Getting started guide](GETTING_STARTED.md) | ||
|
||
## Inspiration | ||
Prefect-dbt-flow draws inspiration from various projects in the data engineering and workflow orchestration space, including: | ||
- cosmos by astronomer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
- anna-geller => prefect-dataplatform | ||
- dbt + Dagster | ||
|
||
## License | ||
This project is licensed under the MIT License. | ||
# License | ||
This project is licensed under the MIT License. You are free to use, modify, and distribute this software as per the terms of the license. If you find this project helpful, please consider giving it a star on GitHub. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
FROM prefecthq/prefect:2.10.17-python3.11 | ||
|
||
COPY ./requirements.txt ./requirements.txt | ||
RUN pip install -r requirements.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
name: 'example_jaffle_shop' | ||
|
||
config-version: 2 | ||
version: '0.1' | ||
|
||
profile: 'example_jaffle_shop' | ||
|
||
model-paths: ["models"] | ||
seed-paths: ["seeds"] | ||
test-paths: ["tests"] | ||
analysis-paths: ["analysis"] | ||
macro-paths: ["macros"] | ||
|
||
target-path: "target" | ||
clean-targets: | ||
- "target" | ||
- "dbt_modules" | ||
- "logs" | ||
|
||
require-dbt-version: [">=1.0.0", "<2.0.0"] | ||
|
||
models: | ||
example_jaffle_shop: | ||
materialized: table | ||
staging: | ||
materialized: view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after we implement the seed command we need to remember to change this.