Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipes] pipes concept and subprocess - tweaks per dogfooding feedback #17185

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions docs/content/guides/dagster-pipes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ description: "Dagster Pipes provides a protocol between the orchestration enviro

Dagster Pipes is a toolkit for building integrations between Dagster and external execution environments. It standardizes the process of passing parameters, injecting context information, ingesting logs, and collecting metadata all while remaining agnostic to how remote computations are launched in those environments. This enables the separation of orchestration and business logic in the Dagster ecosystem.

It also smooths the process of incorporating pre-existing code, business logic, and execution environments into Dagster. With Dagster Pipes and a few lines of code, you can execute your code through the orchestrator. In turn, you can stream logs and metadata back to Dagster so you can leverage its observability, lineage, cataloging, and debugging capabilities.
It is particularly useful for the process of incorporating pre-existing code, business logic, and execution environments into Dagster. With Dagster Pipes and a few lines of code, you can execute your code through the orchestrator. In turn, you can stream logs and metadata back to Dagster so you can leverage its observability, lineage, cataloging, and debugging capabilities.

---

Expand All @@ -29,27 +29,15 @@ With Dagster Pipes, you can:

---

## Limitations

While Dagster Pipes is lightweight and flexible, there are a few limitations to be aware of:

- **Step launchers and Pipes can't currently be used together.** Dagster Pipes is a lightweight alternative to step launchers. Think of this as an either-or situation - you can use step launchers **or** you can use Dagster Pipes. If your external code requires the core Dagster module (see below), you should use step launchers instead of Pipes.

- **Some Dagster concepts aren't supported for use in external processes.** Dagster Pipes (`dagster-pipes`) doesn't include the core Dagster (`dagster`) library. As such, concepts like resources and I/O managers, which are included in `dagster`, aren't available for use in external processes executed by Dagster Pipes.

For example, I/O managers aren't currently supported, as Dagster Pipes isn't built for processing data in memory. To process data in memory in Pyspark, you could use an I/O manager [as demonstrated in the Pyspark integration guide](/integrations/spark#running-pyspark-code-in-assets). Otherwise, in a remote case, you can use Pipes.

---

## How it works

Dagster Pipes provides a protocol between the orchestration environment (Dagster) and external execution (ex: Databricks) and a toolkit for building implementations of that protocol.
Dagster Pipes provides a protocol between the orchestration environment (Dagster) and external execution (ex: Databricks), and a toolkit for building implementations of that protocol.

When Dagster Pipes is invoked, several steps will be carried out in **Dagster's orchestration process** and in the **external process**, such as Databricks.

### In the orchestration process (Dagster)

When Dagster Pipes is called in a Dagster asset, Dagster launches the external process with parameters and context information (ex: `partition_key`, `asset_key`, etc.)
When Dagster Pipes is called in a Dagster asset or op, Dagster launches the external process with parameters and context information (ex: `partition_key`, `asset_key`, etc.)

<Image
alt="Diagram explaining the Dagster Pipes process"
Expand All @@ -66,6 +54,18 @@ After Dagster receives the data from the external process, it’ll be visible in

---

## Limitations

While Dagster Pipes is lightweight and flexible, there are a few limitations to be aware of:

- **Step launchers and Pipes can't currently be used together.** Dagster Pipes is a lightweight alternative to step launchers. Think of this as an either-or situation - you can use step launchers **or** you can use Dagster Pipes. If your external code requires the core Dagster module (see below), you should use step launchers instead of Pipes.

- **Some Dagster concepts aren't supported for use in external processes.** Dagster Pipes (`dagster-pipes`) doesn't include the core Dagster (`dagster`) library. As such, concepts like resources and I/O managers, which are included in `dagster`, aren't available for use in external processes executed by Dagster Pipes.

For example, I/O managers aren't currently supported, as Dagster Pipes isn't built for processing data in memory. To process data in memory in Pyspark, you could use an I/O manager [as demonstrated in the Pyspark integration guide](/integrations/spark#running-pyspark-code-in-assets). Otherwise, in a remote case, you can use Pipes.

---

## Usage

Ready to get started with Dagster Pipes? Check out the [Dagster Pipes tutorial](/guides/dagster-pipes/subprocess) to get up and running!
2 changes: 1 addition & 1 deletion docs/content/guides/dagster-pipes/subprocess.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: "Learn how to use Dagster Pipes's built-in subprocess implementatio

# Dagster Pipes tutorial

In this guide, we’ll show you how to use [Dagster Pipes](/guides/dagster-pipes) with Dagster’s built-in subprocess implementation to run a subprocess with a given command and environment. You can then send information such as structured metadata and logging back to Dagster from the subprocess, where it will be visible in the Dagster UI.
In this guide, we’ll show you how to use [Dagster Pipes](/guides/dagster-pipes) with Dagster’s built-in subprocess `PipesSubprocessClient` to run a local subprocess with a given command and environment. You can then send information such as structured metadata and logging back to Dagster from the subprocess, where it will be visible in the Dagster UI.

To get there, you'll:

Expand Down