Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc 302 new etl tutorial - part 1 #25320

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c275842
file copy
C00ldudeNoonan Oct 11, 2024
054141c
config file creation
C00ldudeNoonan Oct 14, 2024
89be27a
adding additional pages and project config logic
C00ldudeNoonan Oct 16, 2024
59f5a64
add defintions object
C00ldudeNoonan Oct 16, 2024
bf7b65b
Merge remote-tracking branch 'origin/master' into new-etl-tutorial--D…
C00ldudeNoonan Oct 16, 2024
d6d69cf
added intial assets and did some cleanup
C00ldudeNoonan Oct 16, 2024
19d3236
minor typo fixes
C00ldudeNoonan Oct 18, 2024
9b8bdc2
linting
C00ldudeNoonan Oct 18, 2024
6f078db
more to first asset
C00ldudeNoonan Oct 18, 2024
8b6d1f6
consolidated pages and added partitions page
C00ldudeNoonan Oct 21, 2024
8ef90cf
Merge branch 'master' into DOC-302-new-etl-tutorial
C00ldudeNoonan Nov 13, 2024
2425783
add screenshots and update format and writeup
C00ldudeNoonan Nov 14, 2024
49035dd
update name in sidebar for consistency
C00ldudeNoonan Nov 14, 2024
17aff77
vale formatting errors fix
C00ldudeNoonan Nov 14, 2024
75e60fe
applied notes from Nikki
C00ldudeNoonan Nov 15, 2024
d4ff6d3
whitespace fixes
C00ldudeNoonan Nov 15, 2024
b30f860
Update docs/docs-beta/docs/tutorial/03-creating-a-downstream-asset.md
C00ldudeNoonan Nov 19, 2024
140a122
added partitions, automations, and sensors
C00ldudeNoonan Nov 26, 2024
f29065e
add commentary to page 6 and 7
C00ldudeNoonan Dec 2, 2024
130b418
added final pages and screenshots
C00ldudeNoonan Dec 10, 2024
d34c41b
ruff update
C00ldudeNoonan Dec 10, 2024
62e5fd0
Merge branch 'master' into DOC-302-new-etl-tutorial
C00ldudeNoonan Dec 27, 2024
aae8195
updated code references and sidebar
C00ldudeNoonan Dec 27, 2024
1eb255c
page link fixes
C00ldudeNoonan Dec 27, 2024
4148df7
page links
C00ldudeNoonan Dec 27, 2024
aee2029
update links
C00ldudeNoonan Dec 30, 2024
5db379b
update sidebar links to remove folder
C00ldudeNoonan Dec 30, 2024
6974fcb
update 404 link
C00ldudeNoonan Dec 30, 2024
1cb9423
Merge remote-tracking branch 'origin/master' into new-etl-tutorial--D…
C00ldudeNoonan Dec 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: Build an ETL Pipeline
description: Learn how to build an ETL pipeline with Dagster
last_update:
date: 2024-08-10
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
author: Pedram Navid
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
---

# Build your first ETL pipeline

Welcome to this hands-on tutorial where you'll learn how to build an ETL pipeline with Dagster while exploring key parts of Dagster.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

## What you'll learn

C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Setting up a Dagster project with the recommended project structure
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Creating Assets and using Resources to connect to external systems
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Adding metadata to your assets
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Building dependencies between assets
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Running a pipeline by materializing assets
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
- Adding schedules, sensors, and partitions to your assets
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

[Add image for what the completed global asset graph looks like]
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

## Step 1: Set up your Dagster environment
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

First, set up a new Dagster project.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

1. Open your terminal and create a new directory for your project:

```bash title="Create a new directory"
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
mkdir dagster-etl-tutorial
cd dagster-etl-tutorial
```

2. Create a virtual environment and activate it:
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

```bash title="Create a virtual environment"
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
python -m venv dagster_tutorial

Check warning on line 38 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 38, "column": 35}}}, "severity": "WARNING"}
source dagster_tutorial/bin/activate
# On Windows, use `dagster_tutorial\Scripts\activate`
```

3. Install Dagster and the required dependencies:

```bash title="Install Dagster and dependencies"
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
pip install dagster dagster-webserver pandas dagster-duckdb
```

## Step 2: Copying Project Scaffold
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Next we will get the raw data for the project. As well as the project scaffold, Dagster has several pre-built scaffolds you can install depending on your use case. You can see the full up to date list by running. `dagster project list-examples`
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Use the project scaffold command for this project.

Check warning on line 53 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 53, "column": 51}}}, "severity": "WARNING"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
```bash title="ETL Project Scaffold"
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
dagster project from-example --example getting_started_etl_tutorial
```

The project should have this structure.

Check warning on line 58 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 58, "column": 40}}}, "severity": "WARNING"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
<!-- vale off -->
```
dagster-etl-tutorial/
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
├── data/
│ └── products.csv
│ └── sales_data.csv
│ └── sales_reps.csv
│ └── sample_request/
│ └── request.json
├── etl_tutorial/
│ └── definitions.py
├── pyproject.toml
├── setup.cfg
├── setup.py
```
<!-- vale on -->

C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
## Dagster Project Structure
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

In the root directory there are three configuration files that are common in Python package management. These manage dependencies and identifies the Dagster modules in the project. The etl_tutorial folder is where our Dagster definition for this code location exists. The data directory is where the raw data for the project is stored and we will reference these files in our software-defined assets.

Check failure on line 78 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'etl_tutorial' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'etl_tutorial' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 78, "column": 186}}}, "severity": "ERROR"}

Check failure on line 78 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'etl_tutorial'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'etl_tutorial'?", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 78, "column": 186}}}, "severity": "ERROR"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

### File/Directory Descriptions
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

#### Dagster files
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- **etl_tutorial/**: This is a Python module that contains your Dagster code. It is the main directory where you will define your assets, jobs, schedules, sensors, and resources.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- **definitions.py**: This file is typically used to define jobs, schedules, and sensors. It organizes the various components of your Dagster project. This allows Dagster to load the definitions in a module.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

#### Python files
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- **pyproject.toml**: This file is used to specify build system requirements and package metadata for Python projects. It is part of the Python packaging ecosystem.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- **setup.cfg**: This file is used for configuration of your Python package. It can include metadata about the package, dependencies, and other configuration options.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- **setup.py**: This script is used to build and distribute your Python package. It is a standard file in Python projects for specifying package details.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
## What you've learned
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- Set up a Python virtual environment and installed Dagster
- Setup project scaffold
- How a Dagster project is structured and what these files do

## Next steps

- Continue this tutorial with [your first asset](/tutorial/02-your-first-asset)
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
106 changes: 106 additions & 0 deletions docs/docs-beta/docs/tutorial/02-your-first-asset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: Your First Asset

Check warning on line 2 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 2, "column": 24}}}, "severity": "WARNING"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
description: Get the project data and create your first Asset
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
last_update:
date: 2024-10-16
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
author: Alex Noonan
---

# Your First Software Defined Asset
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Now that we have the raw data files and the Dagster project setup lets create some loading those csvs into DuckDB.

Check failure on line 11 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'csvs'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'csvs'?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 11, "column": 98}}}, "severity": "ERROR"}

Check failure on line 11 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'csvs' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'csvs' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 11, "column": 98}}}, "severity": "ERROR"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Asset definitions enable a declarative approach to data management, in which code is the source of truth on what data assets should exist and how those assets are computed.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

<iframe width="560" height="315" src="https://www.youtube.com/embed/In4CUoFKOfY?si=Xnk_CADS1pf7D5BA" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

## What you'll learn
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- Creating our initial definitions object
- Adding a DuckDB resource
- Building some basic software defined assets

## Building definitions object
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

The [definitions](/api/definitions) object in Dagster serves as the central configuration point for defining and organizing various components within a Dagster Project. It acts as a container that holds all the necessary configurations for a code location, ensuring that everything is organized and easily accessible.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

1. Creating Definitions object and DuckDB resource
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Open the `definitions.py` file and add the following import statements and definitions object.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

```python
import json
import os

from dagster_duckdb import DuckDBResource

import dagster as dg

defs = dg.Definitions(
assets=[],
resources={"duckdb": DuckDBResource(database="data/mydb.duckdb")},
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
)
```

C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
## Loading raw data
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

1. Products Asset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Products Asset
### Products asset


We need to create an asset that creates a DuckDB table for the products csv. Additionally we should add meta data to help categorize this asset and give us a preview of what it looks like in the Dagster UI.

Check failure on line 49 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'csv' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'csv' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 49, "column": 73}}}, "severity": "ERROR"}

Check failure on line 49 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'csv'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'csv'?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 49, "column": 73}}}, "severity": "ERROR"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="8" lineEnd="33"/>

You'll notice here that we have meta data for the compute kind for this asset as well as making it part of the ingestion group. Additionally, at the end we add the row count and a preview of what the table looks like.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

2. Sales Reps Asset
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

This code will be very similar to the product asset but this time its focused on Sales Reps.

Check failure on line 57 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Avoid] Avoid using 'very'. Raw Output: {"message": "[Vale.Avoid] Avoid using 'very'.", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 57, "column": 19}}}, "severity": "ERROR"}
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="35" lineEnd="61"/>

3. Sales Data Asset
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Same thing for Sales Data
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="62" lineEnd="87"/>

4. Bringing our assets into the Definitions object
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Now to pull these assets into our Definitions object, add them to the empty list in the assets parameter.

```python
defs = dg.Definitions(
assets=[products,
sales_reps,
sales_data,
],
resources={"duckdb": DuckDBResource(database="data/mydb.duckdb")},
)
```

## Materialize Assets
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Let's fire up Dagster and materialize these assets. If you are not in the project root directory navigate there now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Let's fire up Dagster and materialize these assets. If you are not in the project root directory navigate there now.
To materialize your assets:


C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
```bash title="Navigate to Project Directory"
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
cd getting_started_etl_tutorial
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Start the Dagster server by running the `dagster dev` command. You should see output similar to the following: [screenshot of command output]
3. In a browser, navigate to the URL of the Dagster server.
4. In the Dagster UI, click **Assets**, then click "View global asset lineage" to see all of your assets.
![2048 resolution](/images/tutorial/etl-tutorial/etl-tutorial-first-asset-lineage.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW, running dagster dev didn't open a browser for me.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my assets don't show up in the UI and I get an error that says they're not defined.

Run the `dagster dev` command. Dagster should open up in your browser. Navigate to the Global asset lineage page. You should see this:
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

![2048 resolution](/images/tutorial/etl-tutorial/etl-tutorial-first-asset-lineage.png)
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

Click on products and then materialize. Navigate to the runs tab and select the most recent run.
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

![2048 resolution](/images/tutorial/etl-tutorial/first-asset-run.png)

Do the same for sales_reps and sales_data. Now we have all our ingestion assets materialized.

## What you've learned
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved

- Created a Dagster Definition
- Built our ingestion assets

## Next steps

- Continue this tutorial with your [Asset Dependencies and Checks](/tutorial/03-asset-dependencies-and-checks)
69 changes: 69 additions & 0 deletions docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Asset Dependencies and Checks
C00ldudeNoonan marked this conversation as resolved.
Show resolved Hide resolved
description: Reference Assets as dependencies to other assets and asset checks.

Check warning on line 3 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 3, "column": 80}}}, "severity": "WARNING"}
last_update:
date: 2024-10-16
author: Alex Noonan
---

# Asset Dependencies and Asset Checks

The DAG or Directed Acyclic Graph is a key part of Dagster. This is an improvement over the typical cron workflow for orchestration. With a Dag approach you can easily understand complex data pipelines. The key benefits of Dags are:

Check failure on line 11 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'DAG' instead of 'Dag'. Raw Output: {"message": "[Vale.Terms] Use 'DAG' instead of 'Dag'.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 11, "column": 141}}}, "severity": "ERROR"}

1. Clarity: The DAG provides a clear visual representation of the entire workflow.
2. Efficiency: Parallel tasks can be identified and executed simultaneously.
3. Reliability: Dependencies ensure that tasks are executed in the correct order.
4. Scalability: Complex workflows can be managed effectively.
5. Maintenance: It's easier to update or troubleshoot specific parts of the workflow.

## What you'll learn

- Creating [Asset Dependencies](guides/asset-dependencies.md)
- How to make an [Asset Check](guides/asset-checks.md)

## Creating a Downstream asset

Now that we have all of our raw data loaded and staged into the DuckDB database our next step is to merge it together. The data structure that of a fact table (sales data) with 2 dimensions off of it (sales reps and products). To accomplish that in SQL we will bring in our `sales_data` table and then left join on `sales_reps` and `products` on their respective id columns. Additionally, we will keep this view concise and only have relevant columns for analysis.

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="89" lineEnd="132"/>

As you can see here this asset looks a lot like our previous ones with a few small changes. We put this asset into a different group. To make this asset dependant on the raw tables we add the asset keys the `deps` parameter in the asset definition.

Check failure on line 30 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.british] Use the US spelling 'dependent' instead of the British 'dependant'. Raw Output: {"message": "[Dagster.british] Use the US spelling 'dependent' instead of the British 'dependant'.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 30, "column": 154}}}, "severity": "ERROR"}

## Asset checks

Data Quality is critical in analytics. Just like in a factory producing cars, manufacturers inspect parts after they complete steps to identify defects and processes that may be creating more than acceptable. In this case we want to create a test to identify if there are any rows that have a product or sales rep that are not in the table.

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="134" lineEnd="149"/>


## Materialize the Assets

We need to add the Asset and Asset check we just made to the Definitions object.

Your Definitions object should now look like this:

```python

Check warning on line 45 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 45, "column": 12}}}, "severity": "WARNING"}
defs = dg.Definitions(
assets=[products,
sales_reps,
sales_data,
joined_data,
],
asset_checks=[missing_dimension_check],
resources={"duckdb": DuckDBResource(database="data/mydb.duckdb")},
)
```

Go back into the UI, reload definitions, and materialize the `joined_data` asset. If you navigate to the Asset details page, there is tab for Asset checks where you can see the run history and metadata.

![2048 resolution](/images/tutorial/etl-tutorial/asset-check.png)

## What you've learned

- Creating downstream Assets
- Software Defined Asset checks


## Next steps

- Continue this tutorial with your [Partitions](/tutorial/04-partitions)
10 changes: 10 additions & 0 deletions docs/docs-beta/docs/tutorial/04-partitions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Partitions
description: Partitioning Assets by datetime and categories
last_update:
date: 2024-10-16
author: Alex Noonan
---



62 changes: 0 additions & 62 deletions docs/docs-beta/docs/tutorial/tutorial-etl.md

This file was deleted.

6 changes: 5 additions & 1 deletion docs/docs-beta/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,11 @@ const sidebars: SidebarsConfig = {
type: 'category',
label: 'Tutorial',
collapsed: false,
items: ['tutorial/tutorial-etl'],
items: [
'tutorial/etl-tutorial-introduction',
'tutorial/your-first-asset',
'tutorial/asset-dependencies-and-checks',
],
},
{
type: 'category',
Expand Down
2 changes: 1 addition & 1 deletion docs/docs-beta/src/theme/MDXComponents.tsx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// Import the original mapper
import MDXComponents from '@theme-original/MDXComponents';
import { PyObject } from '../components/PyObject';
import {PyObject} from '../components/PyObject';
import CodeExample from '../components/CodeExample';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading