-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[daggy-u] [dbt] - Add dbt Lesson 2 (DEV-60) (#19868)
## Summary & Motivation This PR adds the content for Lesson 2 of the Dagster + dbt module to Dagster University. TODOs: - [x] Part 1 - Confirm project clone command is correct / do we need to add a destination location to prevent naming collisions? - [x] Part 2 - Import snippet for `setup.py` once available - [x] Part 2 - Add screenshot of asset graph in UI - [x] Part 3 - Confirm `clone` command for dbt project to `analytics` - [x] Part 5 - Should we add logs or errors for `dbt build`? ## How I Tested These Changes --------- Co-authored-by: Tim Castillo <[email protected]> Co-authored-by: Tim Castillo <[email protected]>
- Loading branch information
1 parent
2c97cce
commit e6047ae
Showing
7 changed files
with
245 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
title: Dagster + dbt | ||
--- | ||
|
||
# Dagster + dbt | ||
|
||
- Lesson 1: Introduction | ||
- [What's dbt?](/dagster-dbt/lesson-1/1-whats-dbt) | ||
- [Why use dbt and Dagster together?](/dagster-dbt/lesson-1/2-why-use-dbt-and-dagster-together) | ||
- [How do dbt models relate to Dagster assets?](/dagster-dbt/lesson-1/3-how-do-dbt-models-relate-to-dagster-assets) | ||
- [Project preview](/dagster-dbt/lesson-1/4-project-preview) | ||
- Lesson 2: Installation & Setup | ||
- [Requirements](/dagster-dbt/lesson-2/1-requirements) | ||
- [Set up the Dagster project](/dagster-dbt/lesson-2/2-set-up-the-dagster-project) | ||
- [Set up the dbt project](/dagster-dbt/lesson-2/3-set-up-the-dbt-project) | ||
- [dbt project files](/dagster-dbt/lesson-2/4-dbt-project-files) | ||
- [Verify dbt installation](/dagster-dbt/lesson-2/5-verify-dbt-installation) |
34 changes: 34 additions & 0 deletions
34
docs/dagster-university/pages/dagster-dbt/lesson-2/1-requirements.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
--- | ||
title: "Lesson 2: Requirements" | ||
module: 'dagster_dbt' | ||
lesson: '2' | ||
--- | ||
|
||
## Requirements | ||
|
||
To complete this course, you’ll need: | ||
|
||
- **To install git.** Refer to the [Git documentation](https://github.com/git-guides/install-git) if you don’t have this installed. | ||
- **To have Python installed.** Dagster supports Python 3.9 - 3.12. | ||
- **To install a package manager like pip or poetry**. If you need to install a package manager, refer to the following installation guides: | ||
- [pip](https://pip.pypa.io/en/stable/installation/) | ||
- [Poetry](https://python-poetry.org/docs/) | ||
|
||
To check that Python and the pip or Poetry package manager are already installed in your environment, run: | ||
|
||
```shell | ||
python --version | ||
pip --version | ||
``` | ||
|
||
--- | ||
|
||
## Clone the Dagster University project | ||
|
||
Even if you’ve already completed the Dagster Essentials course, you should still clone the project as some things may have changed. | ||
|
||
Run the following to clone the project: | ||
|
||
```bash | ||
git clone https://github.com/dagster-io/project-dagster-university -b module/dagster-and-dbt/starter | ||
``` |
65 changes: 65 additions & 0 deletions
65
docs/dagster-university/pages/dagster-dbt/lesson-2/2-set-up-the-dagster-project.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
title: "Lesson 2: Set up the Dagster project" | ||
module: 'dagster_dbt' | ||
lesson: '2' | ||
--- | ||
|
||
# Set up the Dagster project | ||
|
||
After downloading the Dagster University project, you’ll need to make a few changes to finish setting things up. | ||
|
||
First, you’ll add a few additional dependencies to the project: | ||
|
||
- `dagster-dbt` - Dagster’s integration library for dbt. This will also install `dbt-core` and `dagster` as dependencies. | ||
- `dbt-duckdb` - A library for using dbt with DuckDB, which we’ll use to store the dbt models we create | ||
|
||
Locate the `setup.py` file in the root of the Dagster University project. Open the file and replace it with the following: | ||
|
||
```python | ||
from setuptools import find_packages, setup | ||
|
||
setup( | ||
name="dagster_university", | ||
packages=find_packages(exclude=["dagster_university_tests"]), | ||
install_requires=[ | ||
"dagster==1.6.*", | ||
"dagster-cloud", | ||
"dagster-duckdb", | ||
"dagster-dbt", | ||
"dbt-duckdb", | ||
"geopandas", | ||
"kaleido", | ||
"pandas", | ||
"plotly", | ||
"shapely", | ||
"smart_open[s3]", | ||
"s3fs", | ||
"smart_open", | ||
"boto3", | ||
], | ||
extras_require={"dev": ["dagster-webserver", "pytest"]}, | ||
) | ||
``` | ||
|
||
{% callout %} | ||
💡 **Heads up!** We strongly recommend installing the project dependencies inside a Python virtual environment. If you need a primer on virtual environments, including creating and activating one, check out this [blog post](https://dagster.io/blog/python-packages-primer-2). | ||
{% /callout %} | ||
|
||
Then, run the following in the command line to rename the `.env.example` file and install the dependencies: | ||
|
||
```bash | ||
cd project_dagster_university | ||
cp .env.example .env | ||
pip install -e ".[dev]" | ||
``` | ||
|
||
The `e` flag installs the project in editable mode, you can modify existing Dagster assets without having to reload the code location. This allows you to shorten the time it takes to test a change. However, you’ll need to reload the code location in the Dagster UI when adding new assets or installing additional dependencies. | ||
|
||
To confirm everything works: | ||
|
||
1. Run `dagster dev` from the directory. | ||
2. Navigate to the Dagster UI ([`http://localhost:3000`](http://localhost:3000/)) in your browser. | ||
3. Open the asset graph by clicking **Assets > View global asset lineage**. | ||
3. Click **Materialize all** to materialize all the assets in the project. **For partitioned assets**, you can materialize just the most recent partition: | ||
|
||
![The Asset Graph in the Dagster UI](/images/dagster-dbt/lesson-2/asset-graph.png) |
28 changes: 28 additions & 0 deletions
28
docs/dagster-university/pages/dagster-dbt/lesson-2/3-set-up-the-dbt-project.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
title: "Lesson 2: Set up the dbt project" | ||
module: 'dagster_dbt' | ||
lesson: '2' | ||
--- | ||
|
||
# Set up the dbt project | ||
|
||
Next, you’ll notice that there is a dbt project called `analytics` in the repository you cloned. Throughout the duration of this module, you’ll add new dbt models and see them reflected in Dagster. | ||
|
||
1. Navigate into the directory by running: | ||
|
||
```bash | ||
cd analytics | ||
``` | ||
|
||
2. Next, install dbt package dependencies by running: | ||
|
||
```bash | ||
dbt deps | ||
``` | ||
|
||
3. In a file explorer or IDE, open the `analytics` directory. You should see the following files, which are the models we’ll use to get started: | ||
|
||
- `models/sources/raw_taxis.yml` | ||
- `models/staging/staging.yml` | ||
- `models/staging/stg_trips.yml` | ||
- `models/staging/stg_zones.yml` |
59 changes: 59 additions & 0 deletions
59
docs/dagster-university/pages/dagster-dbt/lesson-2/4-dbt-project-files.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
--- | ||
title: "Lesson 2: dbt project files" | ||
module: 'dagster_dbt' | ||
lesson: '2' | ||
--- | ||
|
||
# dbt project files | ||
|
||
Before we get started building out the dbt project, let’s go over some of the files in the project. | ||
|
||
--- | ||
|
||
## dbt_project.yml | ||
|
||
From the dbt docs: | ||
|
||
> Every [dbt project](https://docs.getdbt.com/docs/build/projects) needs a `dbt_project.yml` file — this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate your project. | ||
Refer to [the dbt documentation](https://docs.getdbt.com/reference/dbt_project.yml) for more information about `dbt_project.yml`. | ||
|
||
--- | ||
|
||
## profiles.yml | ||
|
||
The next file we’ll cover is the `profiles.yml` file. This file contains connection details for your data platform, such as those for the DuckDB database we’ll use in this course. In this step, we’ll set up a `dev` environment for the project to use, which is where the DuckDB is located. | ||
|
||
Before we start working, you should know: | ||
|
||
- **Don’t put credentials in this file!** We’ll be pushing `profiles.yml` to git, which will compromise them. When we set up the file, we’ll show you how to use environment variables to store connection details securely. | ||
- **We’ll create the file in the `analytics` directory, instead of in dbt’s recommended `.dbt`.** We’re doing this for a few reasons: | ||
- It allows dbt to use the same environment variables as Dagster | ||
- It standardizes the way connections are created as more people contribute to the project | ||
|
||
### Set up profiles.yml | ||
|
||
Now you’re ready - let’s go! | ||
|
||
1. Navigate to the `analytics` directory. | ||
2. In this folder, create a `profiles.yml` file. | ||
3. Copy the following code into the file: | ||
|
||
```yaml | ||
dagster_dbt_university: | ||
target: dev | ||
outputs: | ||
dev: | ||
type: duckdb | ||
path: '../{{ env_var("DUCKDB_DATABASE", "data/staging/data.duckdb") }}' | ||
``` | ||
Let’s review what this does: | ||
- Creates a profile named `dagster_dbt_university` | ||
- Set the default target (data warehouse) for the `dagster_dbt_university` profile to `dev` | ||
- Defines one target: `dev` | ||
- Sets the `type` to `duckdb` | ||
- Sets the `path` using a [dbt macro](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var) to reference the `DUCKDB_DATABASE` environment variable in the project’s `.env` file. With this, your dbt models will be built in the same DuckDB database as where your Dagster assets are materialized. | ||
|
||
The `DUCKDB_DATABASE` environment variable is a relative path from the project’s root directory. For dbt to find it, we prefixed it with `../` to ensure it resolves correctly. |
42 changes: 42 additions & 0 deletions
42
docs/dagster-university/pages/dagster-dbt/lesson-2/5-verify-dbt-installation.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
title: "Lesson 2: Verify dbt installation" | ||
module: 'dagster_dbt' | ||
lesson: '2' | ||
--- | ||
|
||
# Verify dbt installation | ||
|
||
Before continuing, let’s run the dbt project from the command line to confirm that everything is configured correctly. | ||
|
||
From the `analytics` directory, run the following command: | ||
|
||
```bash | ||
dbt build | ||
``` | ||
|
||
The two staging models should materialize successfully and pass their tests: | ||
|
||
```bash | ||
19:56:02 dbt build | ||
19:56:03 Running with dbt=1.7.8 | ||
19:56:05 Registered adapter: duckdb=1.7.1 | ||
19:56:05 Unable to do partial parsing because saved manifest not found. Starting full parse. | ||
19:56:07 Found 2 models, 2 tests, 2 sources, 0 exposures, 0 metrics, 505 macros, 0 groups, 0 semantic models | ||
19:56:07 | ||
19:56:07 Concurrency: 1 threads (target='dev') | ||
19:56:07 | ||
19:56:07 1 of 4 START sql table model main.stg_trips .................................... [RUN] | ||
19:56:09 1 of 4 OK created sql table model main.stg_trips ............................... [OK in 1.53s] | ||
19:56:09 2 of 4 START sql table model main.stg_zones .................................... [RUN] | ||
19:56:09 2 of 4 OK created sql table model main.stg_zones ............................... [OK in 0.07s] | ||
19:56:09 3 of 4 START test accepted_values_stg_zones_borough__Manhattan__Bronx__Brooklyn__Queens__Staten_Island__EWR [RUN] | ||
19:56:09 3 of 4 PASS accepted_values_stg_zones_borough__Manhattan__Bronx__Brooklyn__Queens__Staten_Island__EWR [PASS in 0.06s] | ||
19:56:09 4 of 4 START test not_null_stg_zones_zone_id ................................... [RUN] | ||
19:56:09 4 of 4 PASS not_null_stg_zones_zone_id ......................................... [PASS in 0.04s] | ||
19:56:09 | ||
19:56:09 Finished running 2 table models, 2 tests in 0 hours 0 minutes and 1.95 seconds (1.95s). | ||
19:56:09 | ||
19:56:09 Completed successfully | ||
19:56:09 | ||
19:56:09 Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4 | ||
``` |
Binary file added
BIN
+394 KB
docs/dagster-university/public/images/dagster-dbt/lesson-2/asset-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
e6047ae
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-university ready!
✅ Preview
https://dagster-university-2ckfjr396-elementl.vercel.app
Built with commit e6047ae.
This pull request is being automatically deployed with vercel-action