Skip to content

Commit

Permalink
vale formatting errors fix
Browse files Browse the repository at this point in the history
  • Loading branch information
C00ldudeNoonan committed Nov 14, 2024
1 parent 49035dd commit 17aff77
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 26 deletions.
7 changes: 4 additions & 3 deletions docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Use the project scaffold command for this project.
```

The project should have this structure.

Check warning on line 58 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line. Raw Output: {"message": "[Dagster.chars-eol-whitespace] Remove whitespace characters from the end of the line.", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 58, "column": 40}}}, "severity": "WARNING"}

<!-- vale off -->
```
dagster-etl-tutorial/
├── data/
Expand All @@ -71,10 +71,11 @@ dagster-etl-tutorial/
├── setup.cfg
├── setup.py
```
<!-- vale on -->

## Dagster Project Structure

In the root directory there are three configuration files that are common in Python package management. These manage dependencies and identifies the Dagster modules in the project. The etl_tutorial folder is where our Dagster definition for this code location exists. The data directory is where the raw data for the project is stored and we will reference these files in our software-defined assets.
In the root directory there are three configuration files that are common in Python package management. These manage dependencies and identifies the Dagster modules in the project. The etl_tutorial folder is where our Dagster definition for this code location exists. The data directory is where the raw data for the project is stored and we will reference these files in our software-defined assets.

Check failure on line 78 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'etl_tutorial' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'etl_tutorial' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 78, "column": 186}}}, "severity": "ERROR"}

Check failure on line 78 in docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'etl_tutorial'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'etl_tutorial'?", "location": {"path": "docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md", "range": {"start": {"line": 78, "column": 186}}}, "severity": "ERROR"}

### File/Directory Descriptions

Expand All @@ -96,7 +97,7 @@ In the root directory there are three configuration files that are common in Pyt

- Set up a Python virtual environment and installed Dagster
- Setup project scaffold
- How a Dagster project is structured and what these files do
- How a Dagster project is structured and what these files do

## Next steps

Expand Down
22 changes: 11 additions & 11 deletions docs/docs-beta/docs/tutorial/02-your-first-asset.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ last_update:

# Your First Software Defined Asset

Now that we have the raw data files and the Dagster project setup lets create some loading those csv's into DuckDB.
Now that we have the raw data files and the Dagster project setup lets create some loading those csvs into DuckDB.

Check failure on line 11 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'csvs'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'csvs'?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 11, "column": 98}}}, "severity": "ERROR"}

Check failure on line 11 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'csvs' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'csvs' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 11, "column": 98}}}, "severity": "ERROR"}

Asset definitions enable a declarative approach to data management, in which code is the source of truth on what data assets should exist and how those assets are computed.

Expand All @@ -18,15 +18,15 @@ Asset definitions enable a declarative approach to data management, in which cod

- Creating our initial definitions object
- Adding a DuckDB resource
- Building some basic software defined assets
- Building some basic software defined assets

## Building definitions object

The [definitions](/api/definitions) object in Dagster serves as the central configuration point for defining and organizing various components within a Dagster Project. It acts as a container that holds all the necessary configurations for a code location, ensuring that everything is organized and easily accessible.
The [definitions](/api/definitions) object in Dagster serves as the central configuration point for defining and organizing various components within a Dagster Project. It acts as a container that holds all the necessary configurations for a code location, ensuring that everything is organized and easily accessible.

1. Creating Definitions object and DuckDB resource

Open the definitions.py file and add the following import statements and definitions object.
Open the `definitions.py` file and add the following import statements and definitions object.

```python
import json
Expand All @@ -46,11 +46,11 @@ Open the definitions.py file and add the following import statements and definit

1. Products Asset

We need to create an asset that creates a duckdb table for the products csv. Additionally we should add meta data to help categorize this asset and give us a preview of what it looks like in the Dagster UI.
We need to create an asset that creates a DuckDB table for the products csv. Additionally we should add meta data to help categorize this asset and give us a preview of what it looks like in the Dagster UI.

Check failure on line 49 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'csv' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'csv' spelled correctly?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 49, "column": 73}}}, "severity": "ERROR"}

Check failure on line 49 in docs/docs-beta/docs/tutorial/02-your-first-asset.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'csv'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'csv'?", "location": {"path": "docs/docs-beta/docs/tutorial/02-your-first-asset.md", "range": {"start": {"line": 49, "column": 73}}}, "severity": "ERROR"}

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="8" lineEnd="33"/>

You'll notice here that we have meta data for the compute kind for this asset as well as making it part of the ingestion group. Additionally, at the end we add the row count and a preview of what the table looks like.
You'll notice here that we have meta data for the compute kind for this asset as well as making it part of the ingestion group. Additionally, at the end we add the row count and a preview of what the table looks like.

2. Sales Reps Asset

Expand All @@ -66,7 +66,7 @@ Same thing for Sales Data

4. Bringing our assets into the Definitions object

Now to pull these assets into our definitions object, add them to the empty list in the assets parameter.
Now to pull these assets into our Definitions object, add them to the empty list in the assets parameter.

```python
defs = dg.Definitions(
Expand All @@ -80,21 +80,21 @@ Now to pull these assets into our definitions object, add them to the empty list

## Materialize Assets

Let's fire up Dagster and materialize these assets. If you are not in the project root directory navigate there now.
Let's fire up Dagster and materialize these assets. If you are not in the project root directory navigate there now.

```bash title="Navigate to Project Directory"
cd getting_started_etl_tutorial
```

Run the `dagster dev` command. Dagster should open up in your browser. Navigate to the Global asset lineage page. You should see this
Run the `dagster dev` command. Dagster should open up in your browser. Navigate to the Global asset lineage page. You should see this:

![2048 resolution](/images/tutorial/etl-tutorial/etl-tutorial-first-asset-lineage.png)

Click on products and then materialize. Navigate to the runs tab and select the most recent run.
Click on products and then materialize. Navigate to the runs tab and select the most recent run.

![2048 resolution](/images/tutorial/etl-tutorial/first-asset-run.png)

Do the same for sales_reps and sales_data. Now we have all our ingestion assets materialized
Do the same for sales_reps and sales_data. Now we have all our ingestion assets materialized.

## What you've learned

Expand Down
18 changes: 9 additions & 9 deletions docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ last_update:

# Asset Dependencies and Asset Checks

The DAG or Directed Acyclic Graph is a key part of Dagster. This is an improvement over the typical cron workflow for orchestration. With a Dag approach you can easily understand complex data pipelines. The key benefits of Dags are
The DAG or Directed Acyclic Graph is a key part of Dagster. This is an improvement over the typical cron workflow for orchestration. With a Dag approach you can easily understand complex data pipelines. The key benefits of Dags are:

Check failure on line 11 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'DAG' instead of 'Dag'. Raw Output: {"message": "[Vale.Terms] Use 'DAG' instead of 'Dag'.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 11, "column": 141}}}, "severity": "ERROR"}

1. Clarity: The DAG provides a clear visual representation of the entire workflow.
2. Efficiency: Parallel tasks can be identified and executed simultaneously.
Expand All @@ -18,27 +18,27 @@ The DAG or Directed Acyclic Graph is a key part of Dagster. This is an improveme

## What you'll learn

- Creating [Asset Dependencies](guides/asset-dependencies.md)
- Creating [Asset Dependencies](guides/asset-dependencies.md)
- How to make an [Asset Check](guides/asset-checks.md)

## Creating a Downstream asset

Now that we have all of our raw data loaded and staged into the DuckDB database our next step is to merge it together. The data structure that of a fact table (sales data) with 2 dimensions off of it (sales reps and products). To accomplish that in SQL we will bring in our sales_data table and then left join on sales reps and products on their respective id columns. Additionally, we will keep this view concise and only have relevant columns for analysis.
Now that we have all of our raw data loaded and staged into the DuckDB database our next step is to merge it together. The data structure that of a fact table (sales data) with 2 dimensions off of it (sales reps and products). To accomplish that in SQL we will bring in our `sales_data` table and then left join on `sales_reps` and `products` on their respective id columns. Additionally, we will keep this view concise and only have relevant columns for analysis.

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="89" lineEnd="132"/>

As you can see here this asset looks a lot like our previous ones with a few small changes. We put this asset into a different group. To make this asset dependant on the raw tables we add the asset keys the `deps` parameter in the asset definition.
As you can see here this asset looks a lot like our previous ones with a few small changes. We put this asset into a different group. To make this asset dependant on the raw tables we add the asset keys the `deps` parameter in the asset definition.

Check failure on line 30 in docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.british] Use the US spelling 'dependent' instead of the British 'dependant'. Raw Output: {"message": "[Dagster.british] Use the US spelling 'dependent' instead of the British 'dependant'.", "location": {"path": "docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md", "range": {"start": {"line": 30, "column": 154}}}, "severity": "ERROR"}

## Asset checks

Data Quality is critical in analytics. Just like in a factory producing cars, manufacturers inspect parts after they complete steps to identify defects and processes that may be creating more than acceptable. In this case we want to create a test to identify if there are any rows that have a product or sales rep that are not in the table.
Data Quality is critical in analytics. Just like in a factory producing cars, manufacturers inspect parts after they complete steps to identify defects and processes that may be creating more than acceptable. In this case we want to create a test to identify if there are any rows that have a product or sales rep that are not in the table.

<CodeExample filePath="guides/tutorials/etl_tutorial/etl_tutorial/definitions.py" language="python" lineStart="134" lineEnd="149"/>


## Materialize the Assets

We need to add the asset and asset check we just made to the definitions object.
We need to add the Asset and Asset check we just made to the Definitions object.

Your Definitions object should now look like this:

Expand All @@ -54,14 +54,14 @@ Your Definitions object should now look like this:
)
```

Go back into the UI, reload definitions, and materialize the joined_data asset. If you navigate to the asset details page, there is tab for asset checks where you can see the run history and metadata.
Go back into the UI, reload definitions, and materialize the `joined_data` asset. If you navigate to the Asset details page, there is tab for Asset checks where you can see the run history and metadata.

![2048 resolution](/images/tutorial/etl-tutorial/asset-check.png)

## What you've learned

- Creating downstream assets
- Software defined asset checks.
- Creating downstream Assets
- Software Defined Asset checks


## Next steps
Expand Down
6 changes: 3 additions & 3 deletions docs/docs-beta/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ const sidebars: SidebarsConfig = {
label: 'Tutorial',
collapsed: false,
items: [
'tutorial/01-etl-tutorial-introduction',
'tutorial/02-your-first-asset',
'tutorial/03-asset-dependencies-and-checks',
'tutorial/etl-tutorial-introduction',
'tutorial/your-first-asset',
'tutorial/asset-dependencies-and-checks',
],
},
{
Expand Down

0 comments on commit 17aff77

Please sign in to comment.