Skip to content

Commit

Permalink
[daggy-u] [dbt] - Add Lesson 3 (DEV-59) (#19927)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This PR adds Lesson 3 of the new dbt module to Dagster University.

TODOs:

- [x] Move code examples to correct folder
- [x] Add screenshots

## How I Tested These Changes
  • Loading branch information
erinkcochran87 authored Feb 27, 2024
1 parent fca4b72 commit 6ec952b
Show file tree
Hide file tree
Showing 16 changed files with 347 additions and 1 deletion.
11 changes: 10 additions & 1 deletion docs/dagster-university/pages/dagster-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,13 @@ title: Dagster + dbt
- [Set up the Dagster project](/dagster-dbt/lesson-2/2-set-up-the-dagster-project)
- [Set up the dbt project](/dagster-dbt/lesson-2/3-set-up-the-dbt-project)
- [dbt project files](/dagster-dbt/lesson-2/4-dbt-project-files)
- [Verify dbt installation](/dagster-dbt/lesson-2/5-verify-dbt-installation)
- [Verify dbt installation](/dagster-dbt/lesson-2/5-verify-dbt-installation)
- Lesson 3: Connecting dbt to Dagster
- [Overview](/dagster-dbt/lesson-3/1-overview)
- [Constructing the dbt project](/dagster-dbt/lesson-3/2-constructing-the-dbt-project)
- [Defining the dbt project location in Dagster](/dagster-dbt/lesson-3/3-defining-the-dbt-project-location-in-dagster)
- [Creating a dbt resource in Dagster](/dagster-dbt/lesson-3/4-creating-a-dbt-resource-in-dagster)
- [Loading dbt models into Dagster as assets](/dagster-dbt/lesson-3/5-loading-dbt-models-into-dagster-as-assets)
- [Updating the Definitions object](/dagster-dbt/lesson-3/6-updating-the-definitions-object)
- [Viewing dbt models in the Dagster UI](/dagster-dbt/lesson-3/7-viewing-dbt-models-in-the-dagster-ui)
- [Knowledge check](/dagster-dbt/lesson-3/knowledge-check)
11 changes: 11 additions & 0 deletions docs/dagster-university/pages/dagster-dbt/lesson-3/1-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: 'Lesson 3: Overview'
module: 'dagster_dbt'
lesson: '3'
---

# Overview

As you learned in Lesson 1, Dagster and dbt have similar mental models because both frameworks excel at building data models, defining relationships between them, and materializing them. You should have also grasped that dbt models are conceptually data assets and that these data assets can be represented in Dagster.

In this lesson, you’ll learn how to turn that conceptual understanding into real life by connecting a dbt project to Dagster, manually running your dbt models, and understanding what happens when Dagster runs dbt.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: 'Lesson 3: Constructing the dbt project'
module: 'dagster_dbt'
lesson: '3'
---

# Constructing the dbt project

Independent of Dagster, running most dbt commands creates a set of files in a new directory called `target`. The most important file is the `manifest.json`. More commonly referred to as “the manifest file,” [this file](https://docs.getdbt.com/reference/artifacts/manifest-json) is a complete representation of your dbt project in a predictable format.

When Dagster builds your code location, it reads the manifest file to discover the dbt models and turn them into Dagster assets. There are a variety of ways to build the `manifest.json` file. However, we recommend using the `dbt parse` CLI command.

Change your current working directory to the `analytics` folder and run the following command:

```bash
cd analytics # if you haven't set the directory yet
dbt parse
```

To confirm that a manifest file was generated, you should see two changes in your project:

1. A new directory at `analytics/target`, and
2. In the `target` directory, the `manifest.json` file

{% callout %}
> 💡 We recommend `dbt parse` since it doesn’t require a connection to your data warehouse to generate a manifest file, as opposed to commands like `dbt compile`. This means that `dbt parse` is fast and consistent across any environments you run it in, such as locally or during deployment.
>
> If your dbt models use any [introspective queries](https://docs.getdbt.com/reference/commands/compile#interactive-compile), you may need to run `dbt compile` instead.
{% /callout %}

In Lesson 4, we’ll explore some options for deploying the manifest file more programmatically, along with some tips and tricks on having it regularly build your dbt manifest file during development.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: 'Lesson 3: Defining the dbt project location in Dagster'
module: 'dagster_dbt'
lesson: '3'
---

# Defining the dbt project location in Dagster

As you’ll frequently point your Dagster code to the `target/manifest.json` file and your dbt project in this course, it’ll be helpful to keep a reusable constant to reference where the dbt project is.

In the finished Dagster Essentials project, there should be a file called `assets/constants.py`. Open that file and add the following import at the top:

```python
from pathlib import Path
# import os
```

The `Path` class from the `pathlib` standard library will help us create an accurate pointer to where our dbt project is. At the bottom of this same file, add the following line:

```python
DBT_DIRECTORY = Path(__file__).joinpath("..", "..", "..", "analytics").resolve()
```

This line creates a new constant called `DBT_DIRECTORY`. This line might look a little complicated, so let’s break it down:

- It uses `constants.py`'s file location (via `__file__`) as a point of reference for finding the dbt project
- The arguments in `joinpath` point us towards our dbt project in `analytics`
- The `resolve` method turns that path into an absolute file path that points to the dbt project correctly from any file we’re working in

Now that you can access your dbt project from any other file with the `DBT_DIRECTORY` constant, let’s move on to the first place where you’ll use it: creating the Dagster resource that will run dbt.
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: 'Lesson 3: Creating a Dagster resource to run dbt'
module: 'dagster_dbt'
lesson: '3'
---

# Creating a Dagster resource to run dbt

Our next step is to define a Dagster resource as the entry point used to run dbt commands and configure its execution.

The `DbtCliResource` is the main resource that you’ll be working with. In later sections, we’ll walk through some of the resource’s methods and how to customize what Dagster does when dbt runs.

{% callout %}

> 💡 **Resource refresher:** Resources are Dagster’s recommended way of connecting to other services and tools, such as dbt, your data warehouse, or a BI tool.
> {% /callout %}
Navigate to the `dagster_university/resources/__init__.py`, which is where other resources are defined. Copy and paste the following code to their respective locations:

```python
from dagster_dbt import DbtCliResource

from ..assets.constants import DBT_DIRECTORY
# the import lines go at the top of the file

# this can be defined anywhere below the imports
dbt_resource = DbtCliResource(
project_dir=DBT_DIRECTORY,
)
```

The code above:

1. Imports the `DbtCliResource` from the `dagster_dbt` package that we installed earlier
2. Imports the `DBT_DIRECTORY` constant we just defined
3. Instantiates a new `DbtCliResource` under the variable name `dbt_resource`
4. Tells the resource that the dbt project to execute is found at `DBT_DIRECTORY`
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: 'Lesson 3: Loading dbt models into Dagster as assets'
module: 'dagster_dbt'
lesson: '3'
---

# Loading dbt models into Dagster as assets

Now is the moment that we’ve been building up to since the beginning of this module. Let’s see your dbt models in a Dagster asset graph!

---

## Turn dbt models into assets with @dbt_assets

The star of the show here is the `@dbt_assets` decorator. This is a specialized asset decorator that wraps around a dbt project to tell Dagster what dbt models exist. In the body of the `@dbt_assets` definition, you write exactly how you want Dagster to run your dbt models.

Many Dagster projects may only need one `@dbt_assets`-decorated function that manages the entire dbt project. However, you may need to create multiple definitions for various reasons, such as:

- You have multiple dbt projects
- You want to exclude certain dbt models
- You want to only execute `dbt run` and not `dbt build` on specific models
- You want to customize what happens after certain models finish, such as sending a notification
- You need to configure some sets of models differently

We’ll only create one `@dbt_assets` definition for now, but in a later lesson, we’ll encounter a use case for needing another `@dbt_assets` definition.

---

## Loading the models as assets

1. Create a new file in the `assets` directory called `dbt.py`.

2. Add the following imports to the top of the file:

```python
from dagster import AssetExecutionContext
from dagster_dbt import dbt_assets, DbtCliResource

from .constants import DBT_DIRECTORY
```

3. The `@dbt_assets` decorator requires a path to the project’s manifest file, which is within our `DBT_DIRECTORY`. Use that constant to create a path to the `manifest.json` by copying and pasting the code below:

```python
dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json")
```

Similar to how we used `joinpath` earlier to point to the dbt project’s directory, we’re using it once again to reference `target/manifest.json` more precisely.

4. Now, use the `@dbt_assets` decorator to create a new asset function and provide it with a reference to the manifest:

```python
@dbt_assets(
manifest=dbt_manifest_path,
)
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource):
```

5. Finally, add the following to the body of `dbt_analytics` function:

```python
yield from dbt.cli(["run"], context=context).stream()
```

Notice we provided two arguments here. The first argument is the `context`, which indicates which dbt models to run and any related configurations. The second refers to the dbt resource you’ll be using to run dbt.

Let’s review what’s happening in this line in a bit more detail:

- We use the `dbt` argument (which is a `DbtCliResource`) to execute a dbt command through its `.cli` method.
- The `.stream()` method fetches the events and results of this dbt execution.
- This is one of multiple ways to get the Dagster events, such as what models materialized or tests passed. We recommend starting with this and exploring other methods in the future as your use cases grow (such as fetching the run artifacts after a run). In this case, the above line will execute `dbt run`.
- The results of the `stream` are a Python generator of what Dagster events happened. We used [`yield from`](https://pythonalgos.com/generator-functions-yield-and-yield-from-in-python/) (not just `yield`!) to have Dagster track asset materializations.

At this point, `dbt.py` should look like this:

```python
from dagster import AssetExecutionContext
from dagster_dbt import dbt_assets, DbtCliResource

from .constants import DBT_DIRECTORY


dbt_manifest_path = DBT_DIRECTORY.joinpath("target", "manifest.json")


@dbt_assets(
manifest=dbt_manifest_path,
)
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["run"], context=context).stream()
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: 'Lesson 3: Updating the Definitions object'
module: 'dagster_dbt'
lesson: '3'
---

# Updating the Definitions object

The last step in setting up your dbt project in Dagster is adding the definitions you made (ex. your `dbt_resource` and `dbt_analytics` asset) to your code location’s `Definitions` object.

Modify your root-level `__init__.py` to:

- Load assets from `dbt.py` file, and
- Register the `dbt_resource` from `.resources` under the resource key `dbt`

After making those changes, your root-level `__init__.py` should look like similar to below:

```python
from dagster import Definitions, load_assets_from_modules

from .assets import trips, metrics, requests, dbt # Import the dbt assets
from .resources import database_resource, dbt_resource # import the dbt resource
# ...other existing imports

# ... existing calls to load_assets_from_modules
dbt_analytics_assets = load_assets_from_modules(modules=[dbt]) # Load the assets from the file

# ... other declarations

defs = Definitions(
assets=[*trip_assets, *metric_assets, *requests_assets, *dbt_analytics_assets], # Add the dbt assets to your code location
resources={
"database": database_resource,
"dbt": dbt_resource # register your dbt resource with the code location
},
# .. other definitions
)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: 'Lesson 3: Viewing dbt models in the Dagster UI'
module: 'dagster_dbt'
lesson: '3'
---

# Viewing dbt models in the Dagster UI

Once all the work above has been done, you’re ready to see your dbt models represented as assets! Here’s how you can find your models:

1. Run `dagster dev` and navigate to the asset graph.
2. Expand the `default` group in the asset graph.
3. You should see your two dbt models, `stg_trips` and `stg_zones`, converted as assets within your Dagster project!

![dbt assets with description metadata in the Dagster UI](/images/dagster-dbt/lesson-3/asset-description-metadata.png)

If you’re familiar with the Dagster metadata system, you’ll notice that the descriptions you defined for the dbt models in `staging.yml` are carried over as those for your dbt models. In this case, your `stg_zones`'s description would say _“The taxi zones, with enriched records and additional flags”._

And, of course, the orange dbt logo attached to the assets indicates that they are dbt models.

Click the `stg_trips` node on the asset graph and look at the right sidebar. You’ll get some metadata out-of-the-box, such as the dbt code used for the model, how long the model takes to materialize over time, and the schema of the model.

{% table %}

- dbt model code
- Model schema

---

- ![dbt model code as asset metadata in the Dagster UI](/images/dagster-dbt/lesson-3/dbt-asset-code.png)
- ![model schema as asset metadata in the Dagster UI](/images/dagster-dbt/lesson-3/dbt-asset-table-schema.png)

{% /table %}

---

## Running dbt models with Dagster

After clicking around a bit and seeing the dbt models within Dagster, the next step is to materialize them.

1. Click the `stg_zones` asset.
2. Hold **Command** (or **Control** on Windows/Linux) and click the `stg_trips` asset.
3. Click the **Materialize selected** button toward the top-right section of the asset graph.
4. Click the toast notification at the top of the page (or the hash that appears at the bottom right of a dbt asset’s node) to navigate to the run.
5. Under the run ID - in this case, `35b467ce` - change the toggle from a **timed view (stopwatch)** to the **flat view (blocks).**

The run’s page should look similar to this:

![TODO](/images/dagster-dbt/lesson-3/dbt-run-details-page.png)

Notice that there is only one “block,” or step, in this chart. That’s because Dagster runs dbt as it’s intended to be run: in a single execution of a `dbt` CLI command. This step will be named after the `@dbt_assets` -decorated asset, which we called `dbt_analytics` in the `assets/dbt.py` file.

Scrolling through the logs, you’ll see the dbt commands Dagster executes, along with each model materialization. We want to point out two note-worthy logs.

### dbt commands

![Highlighted dbt command in Dagster's run logs](/images/dagster-dbt/lesson-3/dbt-logs-dbt-command.png)

The log statement that indicates what dbt command is being run. Note that this executed the dbt run specified in the `dbt_analytics` asset.

{% callout %}

> 💡 **What’s `--select fqn:*`?** As mentioned earlier, Dagster tries to run dbt in as few executions as possible. `fqn` is a [dbt selection method](https://docs.getdbt.com/reference/node-selection/methods#the-fqn-method) that is as explicit as it gets and matches the node names in a `manifest.json`. The `*` means it will run all dbt models.
> {% /callout %}
### Materialization events

![Highlighted asset materialization events for dbt assets in Dagster's run logs](/images/dagster-dbt/lesson-3/dbt-logs-materialization-events.png)

The asset materialization events indicating that `stg_zones` and `stg_trips` were successfully materialized during the dbt execution.

Try running just one of the dbt models and see what happens! Dagster will dynamically generate the `--select` argument based on the assets selected to run.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: 'Knowledge check'
module: 'dagster_essentials'
lesson: '3'
---

# Knowledge check

1. Open the `dbt.py` file.

2. Modify `dbt_analytics` to run `dbt build` instead of `dbt run`. The function should look like this afterward:

```python
@dbt_assets(
manifest=dbt_manifest_path
)
def dbt_analytics(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
```

3. In the Dagster UI, re-materialize both of the dbt models.

4. Navigate to the details page for the run you just started.

5. Navigate to the logs for the run.

When finished, proceed to the next page.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

1 comment on commit 6ec952b

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-ggdtzpqu7-elementl.vercel.app

Built with commit 6ec952b.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.