Skip to content

Commit

Permalink
Add code for a use-case repository (#23460)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This adds the `use_case_repository` package under examples which will be
used by `dagster-website` to populate the the Repository on our website.

<img width="1311" alt="image"
src="https://github.com/user-attachments/assets/476a720a-1120-4935-a613-c3d6c749acf3">

All `.md` and `.py` files in `use_case_repository/guides` from the
master branch will be loaded to our website during the build process.

A local server is provided in order to quickly view your markdown files
using `make webserver`. Refer to the `README.md` for more information.
<img width="872" alt="image"
src="https://github.com/user-attachments/assets/9d13d2c5-9631-46b4-aecd-820cfd420b22">

<img width="727" alt="image"
src="https://github.com/user-attachments/assets/3a14ffd0-754c-4231-98b6-3ea0b0cecae3">


## How I Tested These Changes

local testing
  • Loading branch information
PedramNavid authored Aug 9, 2024
1 parent 9b851ee commit bed5fdc
Show file tree
Hide file tree
Showing 20 changed files with 831 additions and 38 deletions.
19 changes: 19 additions & 0 deletions examples/use_case_repository/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.PHONY: install lint fix test webserver

install:
pip install --upgrade uv
uv pip install -e .[dev]

lint:
ruff check .
ruff format --check .

fix:
ruff check --fix .
ruff format .

test:
pytest

webserver:
python -m webserver.main
40 changes: 40 additions & 0 deletions examples/use_case_repository/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## Use Case Repository

This repository contains a collection of use cases that demonstrate various applications and implementations using Dagster.

### Purpose

The use cases in this repository serve two main purposes:

1. They're used to populate the list of available use cases on the Dagster.io website.
2. They provide practical examples for developers and data engineers working with Dagster.

### Integration with Dagster.io

The use cases are automatically fetched from the master branch of this repository during the build process of the Dagster.io website. This ensures that the website always displays the most up-to-date examples. In `dagster-website/scripts/fetchUseCases.js` you can find the code that fetches the use cases from this repository and updates the website.

The script fetches from the master branch of this repository, so you will need to push your changes to the master branch to see them on the website.

### File Structure

Each use case consists of two main components:

1. A Markdown (.md) file: Contains the descriptive content and documentation, along with code snippets.
2. A Python (.py) file: Contains the actual implementation code as a single file.

Both files are utilized on the Dagster.io website. However, only the Python files are subject to automated testing.

The TEMPLATE.md file is used to create new use cases. The actual template lives on our external
Scout platform.

### Important Note

When updating a use case, please make sure to modify both the Markdown and Python files to maintain consistency between the documentation and the implementation.

### Local Preview

To preview your changes locally before committing, you can start a local webserver by running the following command in your terminal:

```
make webserver
```
7 changes: 7 additions & 0 deletions examples/use_case_repository/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "use_case_repository.definitions"
code_location_name = "use_case_repository"
2 changes: 2 additions & 0 deletions examples/use_case_repository/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[metadata]
name = use_case_repository
23 changes: 23 additions & 0 deletions examples/use_case_repository/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from setuptools import find_packages, setup

setup(
name="use_case_repository",
packages=find_packages(exclude=["use_case_repository_tests"]),
install_requires=[
"dagster",
"dagster-embedded-elt",
"dagster-pipes",
"python-frontmatter",
"pymdown-extensions",
"markdown",
"flask",
"sling",
],
extras_require={
"dev": [
"dagster-webserver",
"pytest",
"ruff",
]
},
)
21 changes: 21 additions & 0 deletions examples/use_case_repository/tox.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[tox]
skipsdist = true

[testenv]
download = true
passenv =
CI_*
BUILDKITE*
COVERALLS_REPO_TOKEN
install_command = uv pip install {opts} {packages}
deps =
source: -e ../../python_modules/dagster[test]
source: -e ../../python_modules/dagster-pipes
pypi: dagster[test]
-e .[dev]
allowlist_externals =
/bin/bash
uv
commands =
source: /bin/bash -c '! pip list --exclude-editable | grep -e dagster'
pytest -c ../../pyproject.toml -vv
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: "Snowflake to S3 ETL with Dagster"
description: "This use case demonstrates how to transfer data from Snowflake to Amazon S3 using Dagster. The objective is to automate the extraction of data from Snowflake and store it in S3 for further processing or archival."
tags: ["snowflake", "s3"]
---

# [Insert a Use Case Name that has high SEO Value here]

Provide a brief overview of the use case, including its objectives and the main problem it addresses. All use cases must use Dagster to accomplish tasks.

---

## What You'll Learn

You will learn how to:

- Define a Dagster asset that extracts data from an external source and writes it to a database
- Add other bullets here
- ...

---

## Prerequisites

To follow the steps in this guide, you will need:

- To have Dagster and the Dagster UI installed. Refer to the [Dagster Installation Guide](https://docs.dagster.io/getting-started/installation) for instructions.
- A basic understanding of Dagster. Refer to the [Dagster Documentation](https://docs.dagster.io/getting-started/what-why-dagster) for more information.
- List other prerequisites here

---

## Steps to Implement With Dagster

By following these steps, you will [Provide a general description of what the user will wind up with by the end of the guide]. [Also provide a general description of what this enables them to do].

### Step 1: Enter the Name of Step 1 Here

Provide a brief description of what this step does. Prefer a small, working Dagster
example as a first step. Here is an example of what this might look like:

```python
from dagster import (
asset,
DailyPartitionsDefinition,
AssetExecutionContext,
Definitions,
)

import datetime

# Define the partitions
partitions_def = DailyPartitionsDefinition(start_date="2023-01-01")


@asset(partitions_def=partitions_def)
def upstream_asset(context: AssetExecutionContext):
with open(f"data-{partition_date}.csv", "w") as f:
f.write(f"Data for partition {partition_date}")

snapshot_date = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
with open(f"data-{snapshot_date}.csv", "w") as f:
f.write(f"Data for partition {partition_date}")


defs = Definitions(assets=[upstream_asset])
```

### Step 2: Enter the Name of Step 2 Here

Provide a brief description of what this step does.

### Step 3: Enter the Name of Step 3 Here

Provide a brief description of what this step does.

---

## Expected Outcomes

Describe the expected outcomes of implementing the use case. Include any results or metrics that indicate success.

---

## Troubleshooting

Provide solutions to common issues that might arise while implementing the use case.

---

## Next Steps

What should the person try next after this?

---

## Additional Resources

List any additional resources, such as documentation, tutorials, or community links, that could help users implement the use case.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#! /bin/bash
echo "Hello from CLI"
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: "Using Dagster Pipes Subprocess to Run a CLI Command"
description: "This use case demonstrates how to use Dagster Pipes to run a CLI command within a Dagster asset. The objective is to execute non-Python workloads and integrate their outputs into Dagster's data pipeline."
tags: ["dagster pipes", "subprocess", "CLI"]
---

# Running CLI Commands with Dagster Pipes

This guide demonstrates how to use Dagster Pipes to run a CLI command within a Dagster asset. This is useful for integrating non-Python workloads, such as Bash scripts or other command-line tools, into your Dagster data pipeline.

---

## What You’ll Learn

You will learn how to:

- Define a Dagster asset that invokes a CLI command.
- Use Dagster Pipes to manage subprocess execution.
- Capture and use the output of the CLI command within Dagster.

---

## Prerequisites

To follow the steps in this guide, you'll need:

- To have Dagster and the Dagster UI (`dagster-webserver`) installed. Refer to the [Installation guide](https://docs.dagster.io/getting-started/install) for more info.

---

## Steps to Implement with Dagster

By following these steps, you will have a Dagster asset that successfully runs a CLI command and logs its output. This allows you to integrate non-Python workloads into your Dagster data pipeline.

### Step 1: Define the CLI Command Script

Create a script that contains the CLI command you want to run. For example, create a file named `external_script.sh` with the following content:

```bash
#!/bin/bash
echo "Hello from CLI"
echo "My env var is: ${MY_ENV_VAR}"
```

### Step 2: Define the Dagster Asset

Define a Dagster asset that uses `PipesSubprocessClient` to run the CLI command. Include any necessary environment variables or additional parameters.

Save the following file to `dagster_pipes_cli.py`:

```python
import shutil

from dagster import AssetExecutionContext, Definitions, PipesSubprocessClient, asset

@asset
def cli_command_asset(
context: AssetExecutionContext, pipes_subprocess_client: PipesSubprocessClient
):
cmd = [shutil.which("bash"), "external_script.sh"]
return pipes_subprocess_client.run(
command=cmd,
context=context,
env={"MY_ENV_VAR": "example_value"},
).get_materialize_result()

defs = Definitions(
assets=[cli_command_asset],
resources={"pipes_subprocess_client": PipesSubprocessClient()},
)
```

### Step 3: Configure and Run the Asset

Ensure the script is executable and run the Dagster asset to see the output.

```bash
chmod +x external_script.sh
dagster dev -f path_to_your_dagster_file.py
```

---

## Troubleshooting

- **Permission Denied**: Ensure the script file has executable permissions using `chmod +x`.
- **Command Not Found**: Verify the command is available in the system's `PATH` or provide the full path to the command.

---

## Next Steps

Explore more advanced use cases with Dagster Pipes, such as integrating with other command-line tools or handling more complex workflows.

---

## Additional Resources

- [Dagster Pipes Documentation](https://docs.dagster.io/guides/dagster-pipes)
- [Dagster Installation Guide](https://docs.dagster.io/getting-started/install)
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import shutil

from dagster import AssetExecutionContext, Definitions, PipesSubprocessClient, asset


@asset
def cli_command_asset(
context: AssetExecutionContext, pipes_subprocess_client: PipesSubprocessClient
):
cmd = [shutil.which("bash"), "external_script.sh"]
return pipes_subprocess_client.run(
command=cmd,
context=context,
env={"MY_ENV_VAR": "example_value"},
).get_materialize_result()


defs = Definitions(
assets=[cli_command_asset],
resources={"pipes_subprocess_client": PipesSubprocessClient()},
)
Loading

0 comments on commit bed5fdc

Please sign in to comment.