-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[daggy-u] [dbt] - Add Lesson 3 (DEV-59) (#19927)
## Summary & Motivation This PR adds Lesson 3 of the new dbt module to Dagster University. TODOs: - [x] Move code examples to correct folder - [x] Add screenshots ## How I Tested These Changes
- Loading branch information
1 parent
fca4b72
commit 6ec952b
Showing
16 changed files
with
347 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11 changes: 11 additions & 0 deletions
11
docs/dagster-university/pages/dagster-dbt/lesson-3/1-overview.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: 'Lesson 3: Overview' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Overview | ||
|
||
As you learned in Lesson 1, Dagster and dbt have similar mental models because both frameworks excel at building data models, defining relationships between them, and materializing them. You should have also grasped that dbt models are conceptually data assets and that these data assets can be represented in Dagster. | ||
|
||
In this lesson, you’ll learn how to turn that conceptual understanding into real life by connecting a dbt project to Dagster, manually running your dbt models, and understanding what happens when Dagster runs dbt. |
31 changes: 31 additions & 0 deletions
31
...dagster-university/pages/dagster-dbt/lesson-3/2-constructing-the-dbt-project.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
title: 'Lesson 3: Constructing the dbt project' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Constructing the dbt project | ||
|
||
Independent of Dagster, running most dbt commands creates a set of files in a new directory called `target`. The most important file is the `manifest.json`. More commonly referred to as “the manifest file,” [this file](https://docs.getdbt.com/reference/artifacts/manifest-json) is a complete representation of your dbt project in a predictable format. | ||
|
||
When Dagster builds your code location, it reads the manifest file to discover the dbt models and turn them into Dagster assets. There are a variety of ways to build the `manifest.json` file. However, we recommend using the `dbt parse` CLI command. | ||
|
||
Change your current working directory to the `analytics` folder and run the following command: | ||
|
||
```bash | ||
cd analytics # if you haven't set the directory yet | ||
dbt parse | ||
``` | ||
|
||
To confirm that a manifest file was generated, you should see two changes in your project: | ||
|
||
1. A new directory at `analytics/target`, and | ||
2. In the `target` directory, the `manifest.json` file | ||
|
||
{% callout %} | ||
> 💡 We recommend `dbt parse` since it doesn’t require a connection to your data warehouse to generate a manifest file, as opposed to commands like `dbt compile`. This means that `dbt parse` is fast and consistent across any environments you run it in, such as locally or during deployment. | ||
> | ||
> If your dbt models use any [introspective queries](https://docs.getdbt.com/reference/commands/compile#interactive-compile), you may need to run `dbt compile` instead. | ||
{% /callout %} | ||
|
||
In Lesson 4, we’ll explore some options for deploying the manifest file more programmatically, along with some tips and tricks on having it regularly build your dbt manifest file during development. |
30 changes: 30 additions & 0 deletions
30
...ty/pages/dagster-dbt/lesson-3/3-defining-the-dbt-project-location-in-dagster.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
title: 'Lesson 3: Defining the dbt project location in Dagster' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Defining the dbt project location in Dagster | ||
|
||
As you’ll frequently point your Dagster code to the `target/manifest.json` file and your dbt project in this course, it’ll be helpful to keep a reusable constant to reference where the dbt project is. | ||
|
||
In the finished Dagster Essentials project, there should be a file called `assets/constants.py`. Open that file and add the following import at the top: | ||
|
||
```python | ||
from pathlib import Path | ||
# import os | ||
``` | ||
|
||
The `Path` class from the `pathlib` standard library will help us create an accurate pointer to where our dbt project is. At the bottom of this same file, add the following line: | ||
|
||
```python | ||
DBT_DIRECTORY = Path(__file__).joinpath("..", "..", "..", "analytics").resolve() | ||
``` | ||
|
||
This line creates a new constant called `DBT_DIRECTORY`. This line might look a little complicated, so let’s break it down: | ||
|
||
- It uses `constants.py`'s file location (via `__file__`) as a point of reference for finding the dbt project | ||
- The arguments in `joinpath` point us towards our dbt project in `analytics` | ||
- The `resolve` method turns that path into an absolute file path that points to the dbt project correctly from any file we’re working in | ||
|
||
Now that you can access your dbt project from any other file with the `DBT_DIRECTORY` constant, let’s move on to the first place where you’ll use it: creating the Dagster resource that will run dbt. |
37 changes: 37 additions & 0 deletions
37
...r-university/pages/dagster-dbt/lesson-3/4-creating-a-dbt-resource-in-dagster.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
title: 'Lesson 3: Creating a Dagster resource to run dbt' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Creating a Dagster resource to run dbt | ||
|
||
Our next step is to define a Dagster resource as the entry point used to run dbt commands and configure its execution. | ||
|
||
The `DbtCliResource` is the main resource that you’ll be working with. In later sections, we’ll walk through some of the resource’s methods and how to customize what Dagster does when dbt runs. | ||
|
||
{% callout %} | ||
|
||
> 💡 **Resource refresher:** Resources are Dagster’s recommended way of connecting to other services and tools, such as dbt, your data warehouse, or a BI tool. | ||
> {% /callout %} | ||
Navigate to the `dagster_university/resources/__init__.py`, which is where other resources are defined. Copy and paste the following code to their respective locations: | ||
|
||
```python | ||
from dagster_dbt import DbtCliResource | ||
|
||
from ..assets.constants import DBT_DIRECTORY | ||
# the import lines go at the top of the file | ||
|
||
# this can be defined anywhere below the imports | ||
dbt_resource = DbtCliResource( | ||
project_dir=DBT_DIRECTORY, | ||
) | ||
``` | ||
|
||
The code above: | ||
|
||
1. Imports the `DbtCliResource` from the `dagster_dbt` package that we installed earlier | ||
2. Imports the `DBT_DIRECTORY` constant we just defined | ||
3. Instantiates a new `DbtCliResource` under the variable name `dbt_resource` | ||
4. Tells the resource that the dbt project to execute is found at `DBT_DIRECTORY` |
91 changes: 91 additions & 0 deletions
91
...rsity/pages/dagster-dbt/lesson-3/5-loading-dbt-models-into-dagster-as-assets.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
--- | ||
title: 'Lesson 3: Loading dbt models into Dagster as assets' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Loading dbt models into Dagster as assets | ||
|
||
Now is the moment that we’ve been building up to since the beginning of this module. Let’s see your dbt models in a Dagster asset graph! | ||
|
||
--- | ||
|
||
## Turn dbt models into assets with @dbt_assets | ||
|
||
The star of the show here is the `@dbt_assets` decorator. This is a specialized asset decorator that wraps around a dbt project to tell Dagster what dbt models exist. In the body of the `@dbt_assets` definition, you write exactly how you want Dagster to run your dbt models. | ||
|
||
Many Dagster projects may only need one `@dbt_assets`-decorated function that manages the entire dbt project. However, you may need to create multiple definitions for various reasons, such as: | ||
|
||
- You have multiple dbt projects | ||
- You want to exclude certain dbt models | ||
- You want to only execute `dbt run` and not `dbt build` on specific models | ||
- You want to customize what happens after certain models finish, such as sending a notification | ||
- You need to configure some sets of models differently | ||
|
||
We’ll only create one `@dbt_assets` definition for now, but in a later lesson, we’ll encounter a use case for needing another `@dbt_assets` definition. | ||
|
||
--- | ||
|
||
## Loading the models as assets | ||
|
||
1. Create a new file in the `assets` directory called `dbt.py`. | ||
|
||
2. Add the following imports to the top of the file: | ||
|
||
```python | ||
from dagster import AssetExecutionContext | ||
from dagster_dbt import dbt_assets, DbtCliResource | ||
|
||
from .constants import DBT_DIRECTORY | ||
``` | ||
|
||
3. The `@dbt_assets` decorator requires a path to the project’s manifest file, which is within our `DBT_DIRECTORY`. Use that constant to create a path to the `manifest.json` by copying and pasting the code below: | ||
|
||
```python | ||
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json") | ||
``` | ||
|
||
Similar to how we used `joinpath` earlier to point to the dbt project’s directory, we’re using it once again to reference `target/manifest.json` more precisely. | ||
|
||
4. Now, use the `@dbt_assets` decorator to create a new asset function and provide it with a reference to the manifest: | ||
|
||
```python | ||
@dbt_assets( | ||
manifest=dbt_manifest_path, | ||
) | ||
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource): | ||
``` | ||
|
||
5. Finally, add the following to the body of `dbt_analytics` function: | ||
|
||
```python | ||
yield from dbt.cli(["run"], context=context).stream() | ||
``` | ||
|
||
Notice we provided two arguments here. The first argument is the `context`, which indicates which dbt models to run and any related configurations. The second refers to the dbt resource you’ll be using to run dbt. | ||
|
||
Let’s review what’s happening in this line in a bit more detail: | ||
|
||
- We use the `dbt` argument (which is a `DbtCliResource`) to execute a dbt command through its `.cli` method. | ||
- The `.stream()` method fetches the events and results of this dbt execution. | ||
- This is one of multiple ways to get the Dagster events, such as what models materialized or tests passed. We recommend starting with this and exploring other methods in the future as your use cases grow (such as fetching the run artifacts after a run). In this case, the above line will execute `dbt run`. | ||
- The results of the `stream` are a Python generator of what Dagster events happened. We used [`yield from`](https://pythonalgos.com/generator-functions-yield-and-yield-from-in-python/) (not just `yield`!) to have Dagster track asset materializations. | ||
|
||
At this point, `dbt.py` should look like this: | ||
|
||
```python | ||
from dagster import AssetExecutionContext | ||
from dagster_dbt import dbt_assets, DbtCliResource | ||
|
||
from .constants import DBT_DIRECTORY | ||
|
||
|
||
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json") | ||
|
||
|
||
@dbt_assets( | ||
manifest=dbt_manifest_path, | ||
) | ||
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource): | ||
yield from dbt.cli(["run"], context=context).stream() | ||
``` |
38 changes: 38 additions & 0 deletions
38
...ster-university/pages/dagster-dbt/lesson-3/6-updating-the-definitions-object.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
title: 'Lesson 3: Updating the Definitions object' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Updating the Definitions object | ||
|
||
The last step in setting up your dbt project in Dagster is adding the definitions you made (ex. your `dbt_resource` and `dbt_analytics` asset) to your code location’s `Definitions` object. | ||
|
||
Modify your root-level `__init__.py` to: | ||
|
||
- Load assets from `dbt.py` file, and | ||
- Register the `dbt_resource` from `.resources` under the resource key `dbt` | ||
|
||
After making those changes, your root-level `__init__.py` should look like similar to below: | ||
|
||
```python | ||
from dagster import Definitions, load_assets_from_modules | ||
|
||
from .assets import trips, metrics, requests, dbt # Import the dbt assets | ||
from .resources import database_resource, dbt_resource # import the dbt resource | ||
# ...other existing imports | ||
|
||
# ... existing calls to load_assets_from_modules | ||
dbt_analytics_assets = load_assets_from_modules(modules=[dbt]) # Load the assets from the file | ||
|
||
# ... other declarations | ||
|
||
defs = Definitions( | ||
assets=[*trip_assets, *metric_assets, *requests_assets, *dbt_analytics_assets], # Add the dbt assets to your code location | ||
resources={ | ||
"database": database_resource, | ||
"dbt": dbt_resource # register your dbt resource with the code location | ||
}, | ||
# .. other definitions | ||
) | ||
``` |
72 changes: 72 additions & 0 deletions
72
...university/pages/dagster-dbt/lesson-3/7-viewing-dbt-models-in-the-dagster-ui.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
--- | ||
title: 'Lesson 3: Viewing dbt models in the Dagster UI' | ||
module: 'dagster_dbt' | ||
lesson: '3' | ||
--- | ||
|
||
# Viewing dbt models in the Dagster UI | ||
|
||
Once all the work above has been done, you’re ready to see your dbt models represented as assets! Here’s how you can find your models: | ||
|
||
1. Run `dagster dev` and navigate to the asset graph. | ||
2. Expand the `default` group in the asset graph. | ||
3. You should see your two dbt models, `stg_trips` and `stg_zones`, converted as assets within your Dagster project! | ||
|
||
![dbt assets with description metadata in the Dagster UI](/images/dagster-dbt/lesson-3/asset-description-metadata.png) | ||
|
||
If you’re familiar with the Dagster metadata system, you’ll notice that the descriptions you defined for the dbt models in `staging.yml` are carried over as those for your dbt models. In this case, your `stg_zones`'s description would say _“The taxi zones, with enriched records and additional flags”._ | ||
|
||
And, of course, the orange dbt logo attached to the assets indicates that they are dbt models. | ||
|
||
Click the `stg_trips` node on the asset graph and look at the right sidebar. You’ll get some metadata out-of-the-box, such as the dbt code used for the model, how long the model takes to materialize over time, and the schema of the model. | ||
|
||
{% table %} | ||
|
||
- dbt model code | ||
- Model schema | ||
|
||
--- | ||
|
||
- ![dbt model code as asset metadata in the Dagster UI](/images/dagster-dbt/lesson-3/dbt-asset-code.png) | ||
- ![model schema as asset metadata in the Dagster UI](/images/dagster-dbt/lesson-3/dbt-asset-table-schema.png) | ||
|
||
{% /table %} | ||
|
||
--- | ||
|
||
## Running dbt models with Dagster | ||
|
||
After clicking around a bit and seeing the dbt models within Dagster, the next step is to materialize them. | ||
|
||
1. Click the `stg_zones` asset. | ||
2. Hold **Command** (or **Control** on Windows/Linux) and click the `stg_trips` asset. | ||
3. Click the **Materialize selected** button toward the top-right section of the asset graph. | ||
4. Click the toast notification at the top of the page (or the hash that appears at the bottom right of a dbt asset’s node) to navigate to the run. | ||
5. Under the run ID - in this case, `35b467ce` - change the toggle from a **timed view (stopwatch)** to the **flat view (blocks).** | ||
|
||
The run’s page should look similar to this: | ||
|
||
![TODO](/images/dagster-dbt/lesson-3/dbt-run-details-page.png) | ||
|
||
Notice that there is only one “block,” or step, in this chart. That’s because Dagster runs dbt as it’s intended to be run: in a single execution of a `dbt` CLI command. This step will be named after the `@dbt_assets` -decorated asset, which we called `dbt_analytics` in the `assets/dbt.py` file. | ||
|
||
Scrolling through the logs, you’ll see the dbt commands Dagster executes, along with each model materialization. We want to point out two note-worthy logs. | ||
|
||
### dbt commands | ||
|
||
![Highlighted dbt command in Dagster's run logs](/images/dagster-dbt/lesson-3/dbt-logs-dbt-command.png) | ||
|
||
The log statement that indicates what dbt command is being run. Note that this executed the dbt run specified in the `dbt_analytics` asset. | ||
|
||
{% callout %} | ||
|
||
> 💡 **What’s `--select fqn:*`?** As mentioned earlier, Dagster tries to run dbt in as few executions as possible. `fqn` is a [dbt selection method](https://docs.getdbt.com/reference/node-selection/methods#the-fqn-method) that is as explicit as it gets and matches the node names in a `manifest.json`. The `*` means it will run all dbt models. | ||
> {% /callout %} | ||
### Materialization events | ||
|
||
![Highlighted asset materialization events for dbt assets in Dagster's run logs](/images/dagster-dbt/lesson-3/dbt-logs-materialization-events.png) | ||
|
||
The asset materialization events indicating that `stg_zones` and `stg_trips` were successfully materialized during the dbt execution. | ||
|
||
Try running just one of the dbt models and see what happens! Dagster will dynamically generate the `--select` argument based on the assets selected to run. |
27 changes: 27 additions & 0 deletions
27
docs/dagster-university/pages/dagster-dbt/lesson-3/knowledge-check.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
title: 'Knowledge check' | ||
module: 'dagster_essentials' | ||
lesson: '3' | ||
--- | ||
|
||
# Knowledge check | ||
|
||
1. Open the `dbt.py` file. | ||
|
||
2. Modify `dbt_analytics` to run `dbt build` instead of `dbt run`. The function should look like this afterward: | ||
|
||
```python | ||
@dbt_assets( | ||
manifest=dbt_manifest_path | ||
) | ||
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource): | ||
yield from dbt.cli(["build"], context=context).stream() | ||
``` | ||
|
||
3. In the Dagster UI, re-materialize both of the dbt models. | ||
|
||
4. Navigate to the details page for the run you just started. | ||
|
||
5. Navigate to the logs for the run. | ||
|
||
When finished, proceed to the next page. |
Binary file added
BIN
+313 KB
...er-university/public/images/dagster-dbt/lesson-3/asset-description-metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+151 KB
docs/dagster-university/public/images/dagster-dbt/lesson-3/dbt-asset-code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+159 KB
...agster-university/public/images/dagster-dbt/lesson-3/dbt-asset-table-schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+309 KB
.../dagster-university/public/images/dagster-dbt/lesson-3/dbt-logs-dbt-command.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+286 KB
...iversity/public/images/dagster-dbt/lesson-3/dbt-logs-materialization-events.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+277 KB
docs/dagster-university/public/images/dagster-dbt/lesson-3/dbt-logs-one-asset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+355 KB
.../dagster-university/public/images/dagster-dbt/lesson-3/dbt-run-details-page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6ec952b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-university ready!
✅ Preview
https://dagster-university-ggdtzpqu7-elementl.vercel.app
Built with commit 6ec952b.
This pull request is being automatically deployed with vercel-action