Skip to content

Commit

Permalink
[daggy-u] [dbt] - Lesson 1 (DEV-61) (#19865)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This PR adds the content for Lesson 1 of the Dagster + dbt module to
Dagster University.

## How I Tested These Changes

👀
  • Loading branch information
erinkcochran87 authored Feb 27, 2024
1 parent e6047ae commit 72325ed
Show file tree
Hide file tree
Showing 8 changed files with 210 additions and 80 deletions.
2 changes: 1 addition & 1 deletion docs/content/integrations/dbt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Dagster's [Software-defined Asset](/concepts/assets/software-defined-assets) app

An asset graph like this:

<!-- Note: This is also used in /getting-started/overview -->
<!-- Note: This is also used in /getting-started/overview and Lesson 1 of DU's Dagster + dbt module -->

<!-- ![Dagster graph with dbt, Fivetran, and TensorFlow](/images/integrations/dbt/dagster-dbt-fivetran-tensorflow.png) -->

Expand Down
27 changes: 27 additions & 0 deletions docs/dagster-university/pages/dagster-dbt/lesson-1/1-whats-dbt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Lesson 1: What's dbt?"
module: 'dbt_dagster'
lesson: '1'
---

# What's dbt?

In the world of ETL/ELT, dbt - that’s right, all lowercase - is the ‘T’ in the process of Extracting, Loading, and **Transforming** data. Using familiar languages like SQL and Python, dbt is open-source software that allows users to write and run data transformations against the data loaded into their data warehouses.

Before we go any further, let’s take a look at how the folks at dbt describe their product:

> dbt is a transformation workflow that helps you get more work done while producing higher quality results. You can use dbt to modularize and centralize your analytics code, while also providing your data team with guardrails typically found in software engineering workflows. Collaborate on data models, version them, and test and document your queries before safely deploying them to production, with monitoring and visibility.
>
> dbt compiles and runs your analytics code against your data platform, enabling you and your team to collaborate on a single source of truth for metrics, insights, and business definitions. This single source of truth, combined with the ability to define tests for your data, reduces errors when logic changes, and alerts you when issues arise. ([source](https://docs.getdbt.com/docs/introduction))
---

## Why use dbt?

dbt isn’t popular only for its easy, straightforward adoption, but also because it embraces software engineering best practices. Data analysts can use skills they already have - like SQL expertise - and simultaneously take advantage of:

- **Keeping things DRY** (**Don’t Repeat Yourself).** dbt models, which are business definitions represented in SQL `SELECT` statements, can be referenced in other models. Focusing on modularity allows you to reduce bugs, standardize analytics logic, and get a running start on new analyses.
- **Automatically managing dependencies and generating documentation.** Dependencies between models are not only easy to declare, they’re automatically managed by dbt. Additionally, dbt also generates a DAG (directed acyclic graph), which shows how models in a dbt project relate to each other.
- **Preventing negative impact on end-users.** Support for multiple environments ensures that development can occur without impacting users in production.

Dagster’s approach to building data platforms maps directly to these same best practices, making dbt and Dagster a natural, powerful pairing. In the next section, we’ll dig into this a bit more.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: "Lesson 1: Why use dbt and Dagster together?"
module: 'dbt_dagster'
lesson: '1'
---

# Why use dbt and Dagster together?

At a glance, it might seem like Dagster and dbt do the same thing. Both technologies, after all, work with data assets and are instrumental in modern data platforms.

However, dbt Core can only transform data that is already in a data warehouse - it can’t extract from a source, load it into its final destination, or automate either of these operations. And while you could use dbt Cloud’s native features to schedule running your models, other portions of your data pipelines - such as Fivetran-ingested tables or data from Amazon S3 - won’t be included.

To have everything running together, you need an orchestrator. This is where Dagster comes in:

> Dagster’s core design principles go really well together with dbt. The similarities between the way that Dagster thinks about data pipelines and the way that dbt thinks about data pipelines means that Dagster can orchestrate dbt much more faithfully than other general-purpose orchestrators like Airflow.
>
> At the same time, Dagster is able to compensate for dbt’s biggest limitations. dbt is rarely used in a vacuum: the data transformed using dbt needs to come from somewhere and go somewhere. When a data platform needs more than just dbt, Dagster is a better fit than dbt-specific orchestrators, like the job scheduling system inside dbt Cloud. ([source](https://dagster.io/blog/orchestrating-dbt-with-dagster))
At a glance, using dbt alongside Dagster gives analytics and data engineers the best of both their worlds:

- **Analytics engineers** can author analytics code in a familiar language while adhering to software engineering best practices
- **Data engineers** can easily incorporate dbt into their organization’s wider data platform, ensuring observability and reliability

There’s more, however. Other orchestrators will provide you with one of two less-than-appealing options: running dbt as a single task that lacks visibility, or running each dbt model as an individual task and pushing the execution into the orchestrator, which goes against how dbt is intended to be run.

Using dbt with Dagster is unique, as Dagster separates data assets from the execution that produces them and gives you the ability to monitor and debug each dbt model individually.
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "Lesson 1: How do dbt models relate to Dagster assets?"
module: 'dbt_dagster'
lesson: '1'
---

# How do dbt models relate to Dagster assets?

dbt models _are_ assets: they produce data and can have dependencies. Because of these similarities, Dagster can translate each of your dbt models into a Dagster [Software-defined Asset](https://docs.dagster.io/concepts/assets/software-defined-assets) (SDA).

How can Dagster do this? Each component of a Dagster asset has an equivalent counterpart in a dbt model:

- The **asset key** for a dbt model is (by default) the name of the model
- The **upstream dependencies** of a dbt model are defined with **`ref`** or **`source`** calls within the model's definition
- The **computation** required to compute the asset from its upstream dependencies is the SQL within the model's definition

These similarities make it natural to interact with dbt models as Dagster assets. Using dbt with Dagster, you can create an asset graph like the following:

![Dagster graph with dbt, Fivetran, and TensorFlow](/images/dagster-dbt/lesson-1/example-asset-graph.png)

From code like this:

```python file=/integrations/dbt/potemkin_dag_for_cover_image.py startafter=start endbefore=end
from pathlib import Path

from dagster_dbt import DbtCliResource, dbt_assets, get_asset_key_for_model
from dagster_fivetran import build_fivetran_assets

from dagster import AssetExecutionContext, asset

fivetran_assets = build_fivetran_assets(
connector_id="postgres",
destination_tables=["users", "orders"],
)


@dbt_assets(manifest=Path("manifest.json"))
def dbt_project_assets(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()


@asset(
compute_kind="tensorflow",
deps=[get_asset_key_for_model([dbt_project_assets], "daily_order_summary")],
)
def predicted_orders():
...
```

Let's break down what's happening in this example:

- Using `build_fivetran_assets`, we load two tables (`users`, `orders`) from a Fivetran Postgres connector as Dagster assets
- Using `@dbt_assets`, Dagster reads from a dbt project's `manifest.json` and creates Dagster assets from the dbt models it finds
- Lastly, we create a Dagster `@asset` named `predicted_orders` that has an upstream dependency on a dbt asset named `daily_order_summary`
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: "Lesson 1: Project preview"
module: 'dbt_dagster'
lesson: '1'
---

# Project preview

In this course, we’ll focus on integrating a dbt project with Dagster from end to end. We’ll build on the Dagster project used in the Dagster Essentials course, which uses data from [NYC OpenData](https://opendata.cityofnewyork.us/). If you haven’t completed Dagster Essentials, no worries - you can clone the finished project and build from there. We’ll do this in the next lesson.

By the end of the course, you will:

- Create dbt models and load them into Dagster as assets
- Run dbt and store the transformed data in a DuckDB database
- Apply partitions to incremental dbt models
- Deploy the dbt + Dagster project to Dagster Cloud

If you get stuck or want to jump ahead, check out the [finished project here on GitHub](https://github.com/dagster-io/project-dagster-university/tree/module/dagster-and-dbt).
83 changes: 83 additions & 0 deletions docs/dagster-university/pages/dagster-essentials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: Dagster Essentials
---

- Lesson 1: Introduction
- [What's data engineering?](/dagster-essentials/lesson-1/whats-data-engineering)
- [What's an orchestrator?](/dagster-essentials/lesson-1/whats-an-orchestrator)
- [Orchestration approaches](/dagster-essentials/lesson-1/orchestration-approaches)
- [Why is asset-centric orchestration good for data engineering?](/dagster-essentials/lesson-1/why-is-asset-centric-orchestration-good-for-data-engineering)
- [Project preview](/dagster-essentials/lesson-1/project-preview)
- Lesson 2: Requirements & installation
- [Requirements and installation](/dagster-essentials/lesson-2/requirements-and-installation)
- [Create the Dagster project](/dagster-essentials/lesson-2/create-dagster-project)
- [Project files](/dagster-essentials/lesson-2/project-files)
- Lesson 3: SDAs
- [Overview](/dagster-essentials/lesson-3/overview)
- [What's an asset?](/dagster-essentials/lesson-3/whats-an-asset)
- [Defining your first asset](/dagster-essentials/lesson-3/defining-your-first-asset)
- [Asset materialization](/dagster-essentials/lesson-3/asset-materialization)
- [Viewing run details](/dagster-essentials/lesson-3/viewing-run-details)
- [Troubleshooting failed runs](/dagster-essentials/lesson-3/troubleshooting-failed-runs)
- [Coding practice: Create a taxi_zones_file asset](/dagster-essentials/lesson-3/coding-practice-taxi-zones-file-asset)
- [Recap](/dagster-essentials/lesson-3/recap)
- Lesson 4: Asset dependencies
- [Overview](/dagster-essentials/lesson-4/overview)
- [What's a dependency?](/dagster-essentials/lesson-4/whats-a-dependency)
- [Assets and database execution](/dagster-essentials/lesson-4/assets-and-database-execution)
- [Loading data into a database](/dagster-essentials/lesson-4/loading-data-into-a-database)
- [Practice: Create a taxi_zones asset](/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset)
- [Assets with in-memory computations](/dagster-essentials/lesson-4/assets-with-in-memory-computations)
- [Practice: Create a trips_by_week asset](/dagster-essentials/lesson-4/coding-practice-trips-by-week-asset)
- Lesson 5: Definitions & code locations
- [Overview](/dagster-essentials/lesson-5/overview)
- [What's the Definitions object?](/dagster-essentials/lesson-5/whats-the-definitions-object)
- [What's a code location?](/dagster-essentials/lesson-5/whats-a-code-location)
- [Code locations in the Dagster UI](/dagster-essentials/lesson-5/code-locations-dagster-ui)
- Lesson 6: Resources
- [Overview](/dagster-essentials/lesson-6/overview)
- [What's a resource?](/dagster-essentials/lesson-6/whats-a-resource)
- [Setting up a database resource](/dagster-essentials/lesson-6/setting-up-a-database-resource)
- [Using resources in assets](/dagster-essentials/lesson-6/using-resources-in-assets)
- [Practice: Refactoring assets to use resources](/dagster-essentials/lesson-6/coding-practice-refactoring-assets)
- [Analyzing resource usage using the Dagster UI](/dagster-essentials/lesson-6/analyzing-resources-dagster-ui)
- [Lesson recap](/dagster-essentials/lesson-6/recap)
- Lesson 7: Resources
- [Overview](/dagster-essentials/lesson-7/overview)
- [What are schedules?](/dagster-essentials/lesson-7/what-are-schedules)
- [Practice: Create a weekly_update_job](/dagster-essentials/lesson-7/coding-practice-weekly-update-job)
- [Creating a schedule](/dagster-essentials/lesson-7/creating-a-schedule)
- [Practice: Create a weekly_update_schedule](/dagster-essentials/lesson-7/coding-practice-weekly-update-schedule)
- [Updating the Definitions object](/dagster-essentials/lesson-7/updating-the-definitions-object)
- [Jobs and schedules in the Dagster UI](/dagster-essentials/lesson-7/jobs-schedules-dagster-ui)
- Lesson 8: Partitions and backfills
- [Overview](/dagster-essentials/lesson-8/overview)
- [What are partitions and backfills?](/dagster-essentials/lesson-8/what-are-partitions-and-backfills)
- [Creating a partition](/dagster-essentials/lesson-8/creating-a-partition)
- [Practice: Create a weekly partition](/dagster-essentials/lesson-8/coding-practice-weekly-partition)
- [Adding partitions to assets](/dagster-essentials/lesson-8/adding-partitions-to-assets)
- [Practice: Partition the taxi_trips asset](/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips)
- [Creating a schedule with a date-based partition](/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition)
- [Practice: Partition the trips_by_week asset](/dagster-essentials/lesson-8/coding-practice-partition-trips-by-week)
- [Partitions and backfills in the Dagster UI](/dagster-essentials/lesson-8/partitions-backfills-dagster-ui)
- [Recap](/dagster-essentials/lesson-8/recap)
- Lesson 9: Sensors
- [Overview](/dagster-essentials/lesson-9/overview)
- [What's a sensor?](/dagster-essentials/lesson-9/whats-a-sensor)
- [Configuring asset creation](/dagster-essentials/lesson-9/configuring-asset-creation)
- [Creating an asset triggered by a sensor](/dagster-essentials/lesson-9/creating-an-asset-triggered-by-a-sensor)
- [Creating a job](/dagster-essentials/lesson-9/creating-a-job)
- [Building the sensor](/dagster-essentials/lesson-9/building-the-sensor)
- [Updating the Definitions object](/dagster-essentials/lesson-9/updating-the-definitions-object)
- [Sensors in the Dagster UI](/dagster-essentials/lesson-9/sensors-dagster-ui)
- [Enabling the sensor](/dagster-essentials/lesson-9/enabling-the-sensor)
- [Capstone](/dagster-essentials/capstone)
- Extra credit: Metadata
- [Overview](/dagster-essentials/extra-credit/overview)
- [What's metadata?](/dagster-essentials/extra-credit/whats-metadata)
- [Definition metadata - Asset descriptions](/dagster-essentials/extra-credit/definition-metadata-asset-descriptions)
- [Definition metadata - Asset groups](/dagster-essentials/extra-credit/definition-metadata-asset-groups)
- [Practice: Grouping assets](/dagster-essentials/extra-credit/coding-practice-grouping-assets)
- [Materialization metadata](/dagster-essentials/extra-credit/materialization-metadata)
- [Practice: Add metadata to taxi_zones_file](/dagster-essentials/extra-credit/coding-practice-metadata-taxi-zones-file)
- [Asset metadata as Markdown](/dagster-essentials/extra-credit/asset-metadata-as-markdown)
Loading

1 comment on commit 72325ed

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-1ci83z9mm-elementl.vercel.app

Built with commit 72325ed.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.