-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[daggy-u] [dbt] - Lesson 1 (DEV-61) (#19865)
## Summary & Motivation This PR adds the content for Lesson 1 of the Dagster + dbt module to Dagster University. ## How I Tested These Changes 👀
- Loading branch information
1 parent
e6047ae
commit 72325ed
Showing
8 changed files
with
210 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27 changes: 27 additions & 0 deletions
27
docs/dagster-university/pages/dagster-dbt/lesson-1/1-whats-dbt.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
title: "Lesson 1: What's dbt?" | ||
module: 'dbt_dagster' | ||
lesson: '1' | ||
--- | ||
|
||
# What's dbt? | ||
|
||
In the world of ETL/ELT, dbt - that’s right, all lowercase - is the ‘T’ in the process of Extracting, Loading, and **Transforming** data. Using familiar languages like SQL and Python, dbt is open-source software that allows users to write and run data transformations against the data loaded into their data warehouses. | ||
|
||
Before we go any further, let’s take a look at how the folks at dbt describe their product: | ||
|
||
> dbt is a transformation workflow that helps you get more work done while producing higher quality results. You can use dbt to modularize and centralize your analytics code, while also providing your data team with guardrails typically found in software engineering workflows. Collaborate on data models, version them, and test and document your queries before safely deploying them to production, with monitoring and visibility. | ||
> | ||
> dbt compiles and runs your analytics code against your data platform, enabling you and your team to collaborate on a single source of truth for metrics, insights, and business definitions. This single source of truth, combined with the ability to define tests for your data, reduces errors when logic changes, and alerts you when issues arise. ([source](https://docs.getdbt.com/docs/introduction)) | ||
--- | ||
|
||
## Why use dbt? | ||
|
||
dbt isn’t popular only for its easy, straightforward adoption, but also because it embraces software engineering best practices. Data analysts can use skills they already have - like SQL expertise - and simultaneously take advantage of: | ||
|
||
- **Keeping things DRY** (**Don’t Repeat Yourself).** dbt models, which are business definitions represented in SQL `SELECT` statements, can be referenced in other models. Focusing on modularity allows you to reduce bugs, standardize analytics logic, and get a running start on new analyses. | ||
- **Automatically managing dependencies and generating documentation.** Dependencies between models are not only easy to declare, they’re automatically managed by dbt. Additionally, dbt also generates a DAG (directed acyclic graph), which shows how models in a dbt project relate to each other. | ||
- **Preventing negative impact on end-users.** Support for multiple environments ensures that development can occur without impacting users in production. | ||
|
||
Dagster’s approach to building data platforms maps directly to these same best practices, making dbt and Dagster a natural, powerful pairing. In the next section, we’ll dig into this a bit more. |
26 changes: 26 additions & 0 deletions
26
...ter-university/pages/dagster-dbt/lesson-1/2-why-use-dbt-and-dagster-together.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: "Lesson 1: Why use dbt and Dagster together?" | ||
module: 'dbt_dagster' | ||
lesson: '1' | ||
--- | ||
|
||
# Why use dbt and Dagster together? | ||
|
||
At a glance, it might seem like Dagster and dbt do the same thing. Both technologies, after all, work with data assets and are instrumental in modern data platforms. | ||
|
||
However, dbt Core can only transform data that is already in a data warehouse - it can’t extract from a source, load it into its final destination, or automate either of these operations. And while you could use dbt Cloud’s native features to schedule running your models, other portions of your data pipelines - such as Fivetran-ingested tables or data from Amazon S3 - won’t be included. | ||
|
||
To have everything running together, you need an orchestrator. This is where Dagster comes in: | ||
|
||
> Dagster’s core design principles go really well together with dbt. The similarities between the way that Dagster thinks about data pipelines and the way that dbt thinks about data pipelines means that Dagster can orchestrate dbt much more faithfully than other general-purpose orchestrators like Airflow. | ||
> | ||
> At the same time, Dagster is able to compensate for dbt’s biggest limitations. dbt is rarely used in a vacuum: the data transformed using dbt needs to come from somewhere and go somewhere. When a data platform needs more than just dbt, Dagster is a better fit than dbt-specific orchestrators, like the job scheduling system inside dbt Cloud. ([source](https://dagster.io/blog/orchestrating-dbt-with-dagster)) | ||
At a glance, using dbt alongside Dagster gives analytics and data engineers the best of both their worlds: | ||
|
||
- **Analytics engineers** can author analytics code in a familiar language while adhering to software engineering best practices | ||
- **Data engineers** can easily incorporate dbt into their organization’s wider data platform, ensuring observability and reliability | ||
|
||
There’s more, however. Other orchestrators will provide you with one of two less-than-appealing options: running dbt as a single task that lacks visibility, or running each dbt model as an individual task and pushing the execution into the orchestrator, which goes against how dbt is intended to be run. | ||
|
||
Using dbt with Dagster is unique, as Dagster separates data assets from the execution that produces them and gives you the ability to monitor and debug each dbt model individually. |
54 changes: 54 additions & 0 deletions
54
...sity/pages/dagster-dbt/lesson-1/3-how-do-dbt-models-relate-to-dagster-assets.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
title: "Lesson 1: How do dbt models relate to Dagster assets?" | ||
module: 'dbt_dagster' | ||
lesson: '1' | ||
--- | ||
|
||
# How do dbt models relate to Dagster assets? | ||
|
||
dbt models _are_ assets: they produce data and can have dependencies. Because of these similarities, Dagster can translate each of your dbt models into a Dagster [Software-defined Asset](https://docs.dagster.io/concepts/assets/software-defined-assets) (SDA). | ||
|
||
How can Dagster do this? Each component of a Dagster asset has an equivalent counterpart in a dbt model: | ||
|
||
- The **asset key** for a dbt model is (by default) the name of the model | ||
- The **upstream dependencies** of a dbt model are defined with **`ref`** or **`source`** calls within the model's definition | ||
- The **computation** required to compute the asset from its upstream dependencies is the SQL within the model's definition | ||
|
||
These similarities make it natural to interact with dbt models as Dagster assets. Using dbt with Dagster, you can create an asset graph like the following: | ||
|
||
![Dagster graph with dbt, Fivetran, and TensorFlow](/images/dagster-dbt/lesson-1/example-asset-graph.png) | ||
|
||
From code like this: | ||
|
||
```python file=/integrations/dbt/potemkin_dag_for_cover_image.py startafter=start endbefore=end | ||
from pathlib import Path | ||
|
||
from dagster_dbt import DbtCliResource, dbt_assets, get_asset_key_for_model | ||
from dagster_fivetran import build_fivetran_assets | ||
|
||
from dagster import AssetExecutionContext, asset | ||
|
||
fivetran_assets = build_fivetran_assets( | ||
connector_id="postgres", | ||
destination_tables=["users", "orders"], | ||
) | ||
|
||
|
||
@dbt_assets(manifest=Path("manifest.json")) | ||
def dbt_project_assets(context: AssetExecutionContext, dbt: DbtCliResource): | ||
yield from dbt.cli(["build"], context=context).stream() | ||
|
||
|
||
@asset( | ||
compute_kind="tensorflow", | ||
deps=[get_asset_key_for_model([dbt_project_assets], "daily_order_summary")], | ||
) | ||
def predicted_orders(): | ||
... | ||
``` | ||
|
||
Let's break down what's happening in this example: | ||
|
||
- Using `build_fivetran_assets`, we load two tables (`users`, `orders`) from a Fivetran Postgres connector as Dagster assets | ||
- Using `@dbt_assets`, Dagster reads from a dbt project's `manifest.json` and creates Dagster assets from the dbt models it finds | ||
- Lastly, we create a Dagster `@asset` named `predicted_orders` that has an upstream dependency on a dbt asset named `daily_order_summary` |
18 changes: 18 additions & 0 deletions
18
docs/dagster-university/pages/dagster-dbt/lesson-1/4-project-preview.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
title: "Lesson 1: Project preview" | ||
module: 'dbt_dagster' | ||
lesson: '1' | ||
--- | ||
|
||
# Project preview | ||
|
||
In this course, we’ll focus on integrating a dbt project with Dagster from end to end. We’ll build on the Dagster project used in the Dagster Essentials course, which uses data from [NYC OpenData](https://opendata.cityofnewyork.us/). If you haven’t completed Dagster Essentials, no worries - you can clone the finished project and build from there. We’ll do this in the next lesson. | ||
|
||
By the end of the course, you will: | ||
|
||
- Create dbt models and load them into Dagster as assets | ||
- Run dbt and store the transformed data in a DuckDB database | ||
- Apply partitions to incremental dbt models | ||
- Deploy the dbt + Dagster project to Dagster Cloud | ||
|
||
If you get stuck or want to jump ahead, check out the [finished project here on GitHub](https://github.com/dagster-io/project-dagster-university/tree/module/dagster-and-dbt). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
title: Dagster Essentials | ||
--- | ||
|
||
- Lesson 1: Introduction | ||
- [What's data engineering?](/dagster-essentials/lesson-1/whats-data-engineering) | ||
- [What's an orchestrator?](/dagster-essentials/lesson-1/whats-an-orchestrator) | ||
- [Orchestration approaches](/dagster-essentials/lesson-1/orchestration-approaches) | ||
- [Why is asset-centric orchestration good for data engineering?](/dagster-essentials/lesson-1/why-is-asset-centric-orchestration-good-for-data-engineering) | ||
- [Project preview](/dagster-essentials/lesson-1/project-preview) | ||
- Lesson 2: Requirements & installation | ||
- [Requirements and installation](/dagster-essentials/lesson-2/requirements-and-installation) | ||
- [Create the Dagster project](/dagster-essentials/lesson-2/create-dagster-project) | ||
- [Project files](/dagster-essentials/lesson-2/project-files) | ||
- Lesson 3: SDAs | ||
- [Overview](/dagster-essentials/lesson-3/overview) | ||
- [What's an asset?](/dagster-essentials/lesson-3/whats-an-asset) | ||
- [Defining your first asset](/dagster-essentials/lesson-3/defining-your-first-asset) | ||
- [Asset materialization](/dagster-essentials/lesson-3/asset-materialization) | ||
- [Viewing run details](/dagster-essentials/lesson-3/viewing-run-details) | ||
- [Troubleshooting failed runs](/dagster-essentials/lesson-3/troubleshooting-failed-runs) | ||
- [Coding practice: Create a taxi_zones_file asset](/dagster-essentials/lesson-3/coding-practice-taxi-zones-file-asset) | ||
- [Recap](/dagster-essentials/lesson-3/recap) | ||
- Lesson 4: Asset dependencies | ||
- [Overview](/dagster-essentials/lesson-4/overview) | ||
- [What's a dependency?](/dagster-essentials/lesson-4/whats-a-dependency) | ||
- [Assets and database execution](/dagster-essentials/lesson-4/assets-and-database-execution) | ||
- [Loading data into a database](/dagster-essentials/lesson-4/loading-data-into-a-database) | ||
- [Practice: Create a taxi_zones asset](/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset) | ||
- [Assets with in-memory computations](/dagster-essentials/lesson-4/assets-with-in-memory-computations) | ||
- [Practice: Create a trips_by_week asset](/dagster-essentials/lesson-4/coding-practice-trips-by-week-asset) | ||
- Lesson 5: Definitions & code locations | ||
- [Overview](/dagster-essentials/lesson-5/overview) | ||
- [What's the Definitions object?](/dagster-essentials/lesson-5/whats-the-definitions-object) | ||
- [What's a code location?](/dagster-essentials/lesson-5/whats-a-code-location) | ||
- [Code locations in the Dagster UI](/dagster-essentials/lesson-5/code-locations-dagster-ui) | ||
- Lesson 6: Resources | ||
- [Overview](/dagster-essentials/lesson-6/overview) | ||
- [What's a resource?](/dagster-essentials/lesson-6/whats-a-resource) | ||
- [Setting up a database resource](/dagster-essentials/lesson-6/setting-up-a-database-resource) | ||
- [Using resources in assets](/dagster-essentials/lesson-6/using-resources-in-assets) | ||
- [Practice: Refactoring assets to use resources](/dagster-essentials/lesson-6/coding-practice-refactoring-assets) | ||
- [Analyzing resource usage using the Dagster UI](/dagster-essentials/lesson-6/analyzing-resources-dagster-ui) | ||
- [Lesson recap](/dagster-essentials/lesson-6/recap) | ||
- Lesson 7: Resources | ||
- [Overview](/dagster-essentials/lesson-7/overview) | ||
- [What are schedules?](/dagster-essentials/lesson-7/what-are-schedules) | ||
- [Practice: Create a weekly_update_job](/dagster-essentials/lesson-7/coding-practice-weekly-update-job) | ||
- [Creating a schedule](/dagster-essentials/lesson-7/creating-a-schedule) | ||
- [Practice: Create a weekly_update_schedule](/dagster-essentials/lesson-7/coding-practice-weekly-update-schedule) | ||
- [Updating the Definitions object](/dagster-essentials/lesson-7/updating-the-definitions-object) | ||
- [Jobs and schedules in the Dagster UI](/dagster-essentials/lesson-7/jobs-schedules-dagster-ui) | ||
- Lesson 8: Partitions and backfills | ||
- [Overview](/dagster-essentials/lesson-8/overview) | ||
- [What are partitions and backfills?](/dagster-essentials/lesson-8/what-are-partitions-and-backfills) | ||
- [Creating a partition](/dagster-essentials/lesson-8/creating-a-partition) | ||
- [Practice: Create a weekly partition](/dagster-essentials/lesson-8/coding-practice-weekly-partition) | ||
- [Adding partitions to assets](/dagster-essentials/lesson-8/adding-partitions-to-assets) | ||
- [Practice: Partition the taxi_trips asset](/dagster-essentials/lesson-8/coding-practice-partition-taxi-trips) | ||
- [Creating a schedule with a date-based partition](/dagster-essentials/lesson-8/creating-a-schedule-with-a-date-based-partition) | ||
- [Practice: Partition the trips_by_week asset](/dagster-essentials/lesson-8/coding-practice-partition-trips-by-week) | ||
- [Partitions and backfills in the Dagster UI](/dagster-essentials/lesson-8/partitions-backfills-dagster-ui) | ||
- [Recap](/dagster-essentials/lesson-8/recap) | ||
- Lesson 9: Sensors | ||
- [Overview](/dagster-essentials/lesson-9/overview) | ||
- [What's a sensor?](/dagster-essentials/lesson-9/whats-a-sensor) | ||
- [Configuring asset creation](/dagster-essentials/lesson-9/configuring-asset-creation) | ||
- [Creating an asset triggered by a sensor](/dagster-essentials/lesson-9/creating-an-asset-triggered-by-a-sensor) | ||
- [Creating a job](/dagster-essentials/lesson-9/creating-a-job) | ||
- [Building the sensor](/dagster-essentials/lesson-9/building-the-sensor) | ||
- [Updating the Definitions object](/dagster-essentials/lesson-9/updating-the-definitions-object) | ||
- [Sensors in the Dagster UI](/dagster-essentials/lesson-9/sensors-dagster-ui) | ||
- [Enabling the sensor](/dagster-essentials/lesson-9/enabling-the-sensor) | ||
- [Capstone](/dagster-essentials/capstone) | ||
- Extra credit: Metadata | ||
- [Overview](/dagster-essentials/extra-credit/overview) | ||
- [What's metadata?](/dagster-essentials/extra-credit/whats-metadata) | ||
- [Definition metadata - Asset descriptions](/dagster-essentials/extra-credit/definition-metadata-asset-descriptions) | ||
- [Definition metadata - Asset groups](/dagster-essentials/extra-credit/definition-metadata-asset-groups) | ||
- [Practice: Grouping assets](/dagster-essentials/extra-credit/coding-practice-grouping-assets) | ||
- [Materialization metadata](/dagster-essentials/extra-credit/materialization-metadata) | ||
- [Practice: Add metadata to taxi_zones_file](/dagster-essentials/extra-credit/coding-practice-metadata-taxi-zones-file) | ||
- [Asset metadata as Markdown](/dagster-essentials/extra-credit/asset-metadata-as-markdown) |
Oops, something went wrong.
72325ed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-university ready!
✅ Preview
https://dagster-university-1ci83z9mm-elementl.vercel.app
Built with commit 72325ed.
This pull request is being automatically deployed with vercel-action