Skip to content

Commit

Permalink
[docs] [pipes] - Add Concept page (#17048)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This PR adds the Dagster Pipes concept page to the docs. Note: BK will
fail for this PR until #17057 is merged.

## How I Tested These Changes

👀

---------

Co-authored-by: Pedram Navid <[email protected]>
  • Loading branch information
erinkcochran87 and PedramNavid authored Oct 12, 2023
1 parent 7f7eff5 commit 90d0ca5
Show file tree
Hide file tree
Showing 6 changed files with 84 additions and 6 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -909,6 +909,10 @@
{
"title": "Experimental features",
"children": [
{
"title": "Dagster Pipes",
"path": "/guides/dagster-pipes"
},
{
"title": "Using custom run coordinators",
"path": "/guides/dagster/run-attribution"
Expand Down
6 changes: 6 additions & 0 deletions docs/content/guides.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ Learn to apply [Dagster concepts](/concepts) to your work, explore experimental

### Experimental features

#### Dagster Pipes

- [Dagster Pipes](/guides/dagster-pipes) - A high-level look at Dagster Pipes, a toolkit for building integrations between Dagster and external execution environments

#### Other features

- [Using Custom Run Coordinators to perform run attribution](/guides/dagster/run-attribution) - A look at using a Custom Run Coordinator to perform run attribution

- [Airbyte ingestion as code](/guides/dagster/airbyte-ingestion-as-code) - Configure Airbyte connections with Dagster
Expand Down
71 changes: 71 additions & 0 deletions docs/content/guides/dagster-pipes.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: Dagster Pipes | Dagster Docs
description: "Dagster Pipes provides a protocol between the orchestration environment (Dagster) and external execution (ex: Databricks) and a toolkit for building implementations of that protocol."
---

# Dagster Pipes (Experimental)

<Note>
This feature is currently <strong>experimental</strong>.
</Note>

Dagster Pipes is a toolkit for building integrations between Dagster and external execution environments. It standardizes the process of passing parameters, injecting context information, ingesting logs, and collecting metadata all while remaining agnostic to how remote computations are launched in those environments. This enables the separation of orchestration and business logic in the Dagster ecosystem.

It also smooths the process of incorporating pre-existing code, business logic, and execution environments into Dagster. With Dagster Pipes and a few lines of code, you can execute your code through the orchestrator. In turn, you can stream logs and metadata back to Dagster so you can leverage its observability, lineage, cataloging, and debugging capabilities.

---

## Benefits

With Dagster Pipes, you can:

- Incorporate existing code into Dagster without huge refactors
- Onboard stakeholder teams onto Dagster incrementally
- Run code in external environments, and:
- Easily pass parameters to the code
- Stream unstructured logs and structured metadata back to Dagster
- Separate orchestration and business logic environments
- Use languages other than Python with Dagster

---

## Limitations

While Dagster Pipes is lightweight and flexible, there are a few limitations to be aware of:

- **Step launchers and Pipes can't currently be used together.** Dagster Pipes is a lightweight alternative to step launchers. Think of this as an either-or situation - you can use step launchers **or** you can use Dagster Pipes. If your external code requires the core Dagster module (see below), you should use step launchers instead of Pipes.

- **Some Dagster concepts aren't supported for use in external processes.** Dagster Pipes (`dagster-pipes`) doesn't include the core Dagster (`dagster`) library. As such, concepts like resources and I/O managers, which are included in `dagster`, aren't available for use in external processes executed by Dagster Pipes.

For example, I/O managers aren't currently supported, as Dagster Pipes isn't built for processing data in memory. To process data in memory in Pyspark, you could use an I/O manager [as demonstrated in the Pyspark integration guide](/integrations/spark#running-pyspark-code-in-assets). Otherwise, in a remote case, you can use Pipes.

---

## How it works

Dagster Pipes provides a protocol between the orchestration environment (Dagster) and external execution (ex: Databricks) and a toolkit for building implementations of that protocol.

When Dagster Pipes is invoked, several steps will be carried out in **Dagster's orchestration process** and in the **external process**, such as Databricks.

### In the orchestration process (Dagster)

When Dagster Pipes is called in a Dagster asset, Dagster launches the external process with parameters and context information (ex: `partition_key`, `asset_key`, etc.)

<Image
alt="Diagram explaining the Dagster Pipes process"
src="/images/guides/dagster-pipes/dagster-pipes-process.png"
width={1000}
height={393}
/>

### In the external process

The process starts and loads the context info provided by Dagster. While the process runs, execution data, logs, and any specified metadata are streamed back to Dagster.

After Dagster receives the data from the external process, it’ll be visible in the [Dagster UI](/concepts/webserver/ui).

---

## Usage

Ready to get started with Dagster Pipes? Check out the Dagster Pipes tutorial to get up and running!

This file was deleted.

3 changes: 3 additions & 0 deletions docs/content/integrations/embedded-elt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ sling_job = build_assets_job(
This is an example of how to setup a Sling sync between Postgres and Snowflake:

```python file=/integrations/embedded_elt/postgres_snowflake.py
# pyright: reportGeneralTypeIssues=none
# pyright: reportOptionalMemberAccess=none

import os

from dagster_embedded_elt.sling import (
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

1 comment on commit 90d0ca5

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs ready!

✅ Preview
https://dagster-docs-acm6h0bts-elementl.vercel.app
https://master.dagster.dagster-docs.io

Built with commit 90d0ca5.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.