Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish to v0.0.1-alpha #7

Merged
merged 11 commits into from
Oct 13, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,20 @@ jobs:
run: |
pip install poetry==${{ env.POETRY_VERSION }}
poetry install
# - name: Bump package version
# run: |
# poetry version ${{ needs.get-new-tag.outputs.tag-id }}
# echo '__version__ = "${{ needs.get-new-tag.outputs.tag-id }}"' > databooks/version.py
# - name: Configure PiPy, version and build
# run: |
# poetry config pypi-token.pypi ${{ secrets.PIPY_TOKEN }}
# poetry config repositories.test-pypi https://test.pypi.org/legacy/
# poetry config pypi-token.test-pypi ${{ secrets.TEST_PIPY_TOKEN }}
# poetry build
# - name: Publish packages
# run: |
# poetry publish -r test-pypi
# poetry publish
- name: Bump package version
run: |
poetry version ${{ needs.get-new-tag.outputs.tag-id }}
echo '__version__ = "${{ needs.get-new-tag.outputs.tag-id }}"' > prefect_dbt_flow/version.py
- name: Configure PiPy, version and build
run: |
poetry config pypi-token.pypi ${{ secrets.PIPY_TOKEN }}
poetry config repositories.test-pypi https://test.pypi.org/legacy/
poetry config pypi-token.test-pypi ${{ secrets.TEST_PIPY_TOKEN }}
poetry build
- name: Publish packages
run: |
poetry publish -r test-pypi
poetry publish
- name: Tag
uses: actions/github-script@v5
with:
Expand Down
28 changes: 23 additions & 5 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,43 @@ name: 'tests'
on: [push, pull_request]

jobs:
tests:
linting:
env:
POETRY_VERSION: 1.5.0 # set your poetry version here
POETRY_VERSION: 1.5.1
runs-on: ubuntu-latest
strategy: # drop this if you only want to test for a specific version
strategy:
matrix:
python-version: ["3.10", "3.11"]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }} # or your python specific version
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }} # or your python specific version
python-version: ${{ matrix.python-version }}
- name: setup poetry
run: |
pip install poetry==${{ env.POETRY_VERSION }}
poetry config virtualenvs.create false
poetry install --no-interaction --no-ansi
- name: run precommit hooks
run: pre-commit run --all-files
tests:
env:
POETRY_VERSION: 1.5.1
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11"]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: setup poetry
run: |
pip install poetry==${{ env.POETRY_VERSION }}
poetry config virtualenvs.create false
poetry install --no-interaction --no-ansi
- name: run pytest
run: poetry run pytest
11 changes: 1 addition & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,4 @@ repos:
language: system
pass_filenames: false
always_run: true
fail_fast: true
# - id: pytest
# name: pytest
# stages: [commit]
# types: [python]
# entry: poetry run pytest
# language: system
# pass_filenames: false
# always_run: true
# fail_fast: true
fail_fast: true
147 changes: 147 additions & 0 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Getting started guide

The Prefect-dbt-flow library allows you to seamlessly integrate dbt workflows into Prefect. This usage guide will walk you through the steps required to create and manage a Prefect flow for your dbt project.

## Example guide
This guide will walk you through setting up and running a sample Prefect-dbt-flow using Docker Compose. Follow these steps to get started:

### 1. Clone this repository
Clone the Prefect-dbt-flow repository and navigate to the example directory.
```bash
git clone [email protected]:datarootsio/prefect-dbt-flow.git
cd prefect-dbt-flow/example/jaffle_shop
```

### 2. Install Docker Compose
Ensure that you have Docker Compose installed on your system. If you haven't already installed it, refer to the [Docker Compose Installation Guide](https://docs.docker.com/compose/install/) for instructions.

### 3. Start the Docker Container
Start the Docker container by running the following command. This command will launch three services defined in the docker-compose file:
- A PostgreSQL database,
- A Prefect server accessible at: `http://0.0.0.0:4200/`,
- A CLI environment with all the required components installed.
```bash
docker compose up -d
```

### 4. Access the cli service
To access the CLI service, use the following command:
```bash
docker compose run cli
```

### 5. Run the Prefect flow
Inside the CLI environment,

run the following comand to seed the csv files:
```bash
dbt seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after we implement the seed command we need to remember to change this.

```

run the Prefect-dbt-flow using the following command:
```bash
python my_prefect_dbt_flow.py
```
This command will execute the Prefect flow and print its status to the terminal.

### 6. View the reseults
To view the results and monitor the flow, follow these steps:

- Open a web browser and go to `http://0.0.0.0:4200/`.
- In the Prefect Server interface, click on the flow run. It should have a similar name to `adjective-animal`.
- From there, you can explore the dbt job DAG and its associated logs.

With these steps, you can set up and run a Prefect-dbt-flow and monitor its progress through the Prefect Server interface.

# How does it works?

## Installation
Before using Prefect-dbt-flow, you need to install the library. You can do this using pip:
```shell
pip install prefect-dbt-flow
```
You can install an specific version of **Prefect** if you need to:
```shell
pip install prefect==2.13.5
```

## Creating a Prefect Flow
To get started, you'll need to create a Prefect flow that incorporates your dbt project. Here's a step-by-step guide:
1. **Import the Required Modules:**
Start by importing the necessary modules from prefect_dbt_flow:
```python
from prefect_dbt_flow import dbt_flow
```
2. **Define the Prefect Flow:**
Create a Prefect flow by initializing a `dbtFlow.dbt_flow` object. You can configure it with your dbt project, profile, and additional options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just dbt_flow

* **project**: A DbtProject object representing the dbt project configuration.
* **profile**: A DbtProfile object representing the dbt profile configuration.
* **dag_options**: A DbtDagOptions object to specify dbt DAG configurations.
* **flow_kwargs**: A dictionary of Prefect flow arguments.
Here's a basic example of how to use dbt_flow():
```python
my_flow = dbtFlow.dbt_flow(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is not correct

project=dbtFlow.DbtProject(
name="my_flow",
project_dir="path_to/dbt_project",
profiles_dir="path_to/dbt_profiles",
),
profile=dbtFlow.DbtProfile(
target="dev",
),
dag_options=dbtFlow.DbtDagOptions(
run_test_after_model=True,
),
)
```
With this basic setup, you have created a Prefect flow that manages your dbt project. When you run the script, Prefect will execute the dbt tasks defined in your project.
3. **Run the Flow:**
To execute the Prefect flow, add the following code block:
```python
if __name__ == "__main__":
my_flow()
```
4. **Start the prefect server**
You will need to start prefect before the run
```shell
prefect server start
```
You can check up the dashoard at `http://0.0.0.0:4200`
5. **Running the Prefect Flow:**
To run the Prefect flow, simply execute your Python script:
```shell
python my_prefect_dbt_flow.py
```
Make sure you are in the correct directory or provide the full path to your script. Prefect will execute the dbt tasks defined in your flow, providing orchestration and monitoring capabilities.
6. **See the run**
You will be able to see the results of the run on the prefect dashboard at `http://0.0.0.0:4200`

## Advanced Configuration
In the previous section, you configured your dbt project within the Prefect flow. Here's how you can customize the configuration further:

### Dbt Project Configuration:
You specified the name, project directory, and profiles directory when creating the DbtProject object. Adjust these values to match your dbt project's setup.
- `DbtProject`: Represents your dbt project configuration.
- `name`: Name of the dbt project.
- `project_dir`: Path to the directory containing the project.yml configuration file.
- `profiles_dir`: Path to the directory containing the profiles.yml file.

### Dbt Profile Configuration:
The DbtProfile object allows you to set the target profile for your dbt project. This profile should match the configuration in your dbt profiles.yml file.
- `DbtProfile`: Represents the dbt profile configuration.
- `target`: Specify the dbt target (e.g., "dev" or "prod").

### Dag Options:
The DbtDagOptions object lets you define various options for your dbt workflow. In the provided example, we set run_test_after_model to True, indicating that dbt tests should run after each dbt model.
- `DbtDagOptions`: Allows you to specify dbt DAG configurations.
- `select`: Specify a dbt module to include in the run.
- `exclude`: Specify a dbt module to exclude in the run.
- `run_test_after_model`: Set this to True to run tests after running models.

### Prefect flow configuration
Prefect-dbt-flow integrates with Prefect's monitoring and error handling capabilities. You can use Prefect features like scheduling, notifications, and task retries to monitor and manage your dbt flows effectively. You can pass this additional Prefect flow configuration options using a dictionary into: `flow_kwargs`.

For more information on these features, consult the [Prefect documentation.](https://docs.prefect.io/2.10.12/api-ref/prefect/flows/#prefect.flows.flow)

## Conclusion
Prefect-dbt-flow simplifies the orchestration and management of dbt workflows within a Prefect flow. By following the steps in this guide, you can easily create and execute data pipelines that incorporate dbt projects. Be aware of breaking changes as this library is actively developed, and consult the changelog for updates. Happy data engineering! :rocket:
89 changes: 71 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,81 @@
![dataroots.png](https://dataroots.io/assets/logo/logo-rainbow.png)
[![maintained by dataroots](https://img.shields.io/badge/maintained%20by-dataroots-%2300b189)](https://dataroots.io)
<p align="center">
<a href="https://datarootsio.github.io/prefect-dbt-flow"><img alt="logo" src="https://dataroots.io/assets/logo/logo-rainbow.png"></a>
</p>
<p align="center">
<a href="https://dataroots.io"><img alt="Maintained by dataroots" src="https://dataroots.io/maintained-rnd.svg" /></a>
<a href="https://pypi.org/project/prefect-dbt-flow/"><img alt="Python versions" src="https://img.shields.io/pypi/pyversions/prefect-dbt-flow" /></a>
<a href="https://pypi.org/project/prefect-dbt-flow/"><img alt="PiPy" src="https://img.shields.io/pypi/v/prefect-dbt-flow" /></a>
<a href="https://pepy.tech/project/prefect-dbt-flow"><img alt="Downloads" src="https://pepy.tech/badge/prefect-dbt-flow" /></a>
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg" /></a>
<a href="http://mypy-lang.org/"><img alt="Mypy checked" src="https://img.shields.io/badge/mypy-checked-1f5082.svg" /></a>
<!-- <a href="https://pepy.tech/project/prefect-dbt-flow"><img alt="Codecov" src="https://codecov.io/github/datarootsio/databooks/main/graph/badge.svg" /></a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

<a href="https://github.com/datarootsio/databooks/actions"><img alt="test" src="https://github.com/datarootsio/databooks/actions/workflows/test.yml/badge.svg" /></a> -->
</p>

# prefect-dbt-flow
Welcome to the prefect-dbt-flow integration repository! This project aims to provide a seamless integration for simplifying the execution of dbt workflows using Prefect.
Prefect-dbt-flow is a Python library that enables Prefect to convert dbt workflows into independent tasks within a Prefect flow. This integration simplifies the orchestration and execution of dbt models and tests using Prefect, allowing you to build robust data pipelines and monitor your dbt projects efficiently.

## Requirements
Before you get started, make sure you have the following prerequisites installed on your system:
**Active Development Notice:** Prefect-dbt-flow is actively under development and may not be ready for production use. We advise users to be aware of potential breaking changes as the library evolves. Please check the changelog for updates.

- python
- prefect
- dbt
## Table of Contents
- [Introduction](#introduction)
- [Why Use Prefect-dbt-flow?](#why-use-prefect-dbt-flow)
- [How to Install](#how-to-install)
- [Basic Usage](#basic-usage)
- [Inspiration](#inspiration)
- [License](#license)

## Installation
``` bash
pip install prefect-dbt-flow
```
## Introduction
Prefect-dbt-flow is a tool designed to streamline the integration of dbt workflows into Prefect. dbt is an immensely popular tool for building and testing data transformation models, and Prefect is a versatile workflow management system. This integration brings together the best of both worlds, empowering data engineers and analysts to create robust data pipelines.

## Why Use Prefect-dbt-flow?
### Simplified Orchestration
With Prefect-dbt-flow, you can orchestrate your dbt workflows with ease. Define and manage your dbt projects and models as Prefect tasks, creating a seamless pipeline for data transformation.

[Simplified Orchestration]()

### Monitoring and Error Handling
Prefect provides extensive monitoring capabilities and error handling. Now, you can gain deep insights into the execution of your dbt workflows and take immediate action in case of issues.

## Usage
[Monitoring and Error Handling]()

### Create a flow
### Workflow Consistency
Ensure your dbt workflows run consistently by managing them through Prefect. This consistency is crucial for maintaining data quality and reliability.

``` python
TODO flow description
[Workflow Consistency]()

## How to Install
You can install Prefect-dbt-flow via pip:
```shell
pip install prefect-dbt-flow
```
## Basic Usage
Here's an example of how to use Prefect-dbt-flow to create a Prefect flow for your dbt project:
```python
import prefect_dbt_flow as dbtFlow
my_flow = dbtFlow.dbt_flow(
project=dbtFlow.DbtProject(
name="my_flow",
project_dir="path_to/dbt_project",
profiles_dir="path_to/dbt_profiles",
),
profile=dbtFlow.DbtProfile(
target="dev",
),
dag_options=dbtFlow.DbtDagOptions(
run_test_after_model=True,
),
)
if __name__ == "__main__":
my_flow()
```
For more information consult the [Getting started guide](GETTING_STARTED.md)

## Inspiration
Prefect-dbt-flow draws inspiration from various projects in the data engineering and workflow orchestration space, including:
- cosmos by astronomer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- anna-geller => prefect-dataplatform
- dbt + Dagster

## License
This project is licensed under the MIT License.
# License
This project is licensed under the MIT License. You are free to use, modify, and distribute this software as per the terms of the license. If you find this project helpful, please consider giving it a star on GitHub.
2 changes: 1 addition & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

- [x] Feature: Basic DAG generation
- [x] Make sure everything is typed (and checked with mypy)
- [ ] Add cicd to push package to pypi
- [x] Add cicd to push package to pypi
- [x] Documentation
- [ ] Add basic example
- [ ] Add some simple tests
Expand Down
4 changes: 4 additions & 0 deletions examples/jaffle_shop/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM prefecthq/prefect:2.10.17-python3.11

COPY ./requirements.txt ./requirements.txt
RUN pip install -r requirements.txt
26 changes: 26 additions & 0 deletions examples/jaffle_shop/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: 'example_jaffle_shop'

config-version: 2
version: '0.1'

profile: 'example_jaffle_shop'

model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]

target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]

models:
example_jaffle_shop:
materialized: table
staging:
materialized: view
Loading