Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs][pipes] - Subprocess page #17096

Conversation

yuhan
Copy link
Contributor

@yuhan yuhan commented Oct 9, 2023

Summary & Motivation

depends on #17092

This PR adds "Integrating Subprocess with Dagster Pipes" page which is tailored to the data platform eng persona. this page is also designed to pair with "Modifying existing code to work with Dagster Pipes" page (its parent pr in the stack).

build will be green once #17048 is landed.

TODOs

  • update screenshots - all of them right now are for reviewing purpose only. will update once the overall flow is reviewed.
  • add tests for code snippets

How I Tested These Changes

https://10-09--docs-pipes---subprocess-page.dagster.dagster-docs.io/guides/dagster-pipes/integrating-subprocess-with-dagster-pipes

@yuhan
Copy link
Contributor Author

yuhan commented Oct 9, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@yuhan yuhan force-pushed the 10-09-_docs_pipes_-_Subprocess_page branch from c8629f0 to 54f16f6 Compare October 9, 2023 18:08
@yuhan yuhan force-pushed the 10-09-_docs_pipes_-_Subprocess_page branch from 54f16f6 to 21ccaeb Compare October 9, 2023 18:15

## Step 1: Create a Dagster asset that executes a subprocess

In this step, you’ll create a Dagster asset that, when materialized, opens a Dagster pipes session and invokes subprocess that executes some external code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads awkwardly as "when materialized" could be interpreted as "when the external script has completed"

"asset that, in its execution function" could be good language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we refer to pipes process elsewhere, pipes session; what is the noun?

- Provided <PyObject object="AssetExecutionContext" /> as the `context` argument to the asset. This object provides system information such as resources, config, and logging. We’ll come back to this a bit later in this section.
- Specified a resource for the asset to use, `PipesSubprocessClient`. We’ll also come back to this in a little bit.
- Declared a command list `cmd` to run the external script. In the list:
- First, found the path to the Python executable on the system using `shutil.which("python")`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would note that this is fairly contrived and that this becomes more compelling if the external script is either 1) invoked using a different python virtual env or 2) is a different programming language.


- The `PipesSubprocessClient` resource used by the asset contains a `run` method.
- When the asset is executed, this method will synchronously execute the subprocess with in a pipes session, and it will return a `PipesClientCompletedInvocation` object.
- This object contains a `get_results` method, which you can use to access the results reported by the subprocess.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is results here?

I think it'd also be worth mentioning why PipesSubprocessClient exists instead of just opening a subprocess manually.

@yuhan yuhan mentioned this pull request Oct 9, 2023
3 tasks
@yuhan
Copy link
Contributor Author

yuhan commented Oct 10, 2023

close in favor of the alternative structure: #17108

@yuhan yuhan closed this Oct 10, 2023
yuhan added a commit that referenced this pull request Oct 12, 2023
## Summary & Motivation
Instead of the previous PRs cross-referencing each other, we make them part 1 and part 2, with a note in part 1 to navigate the stakeholder personas (who have dagster assets set up by others) to jump directly to part 2. It also splits out a standalone "Reference" page to show dagster code and external code side-by-side for advanced pipes + subprocess usage.

Concretely, this PR rearranges the content of #17092 and #17096 into:

<img width="200" alt="Screenshot 2023-10-09 at 4 48 17 PM" src="https://github.com/dagster-io/dagster/assets/4531914/3371e6b1-7ddc-4e67-b98b-2ec599fadca3">

Preview Links:
- [Dagster Pipes + subprocess](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess)
  * [Tutorial part 1: Define a Dagster asset](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/create-subprocess-asset)
  * [Tutorial part 2: Modify external code](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/modify-external-code)
  * [Reference](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/reference) - this includes example code for both dagster side and external side listed side by side in tabs 

TODOs
- [x] incorporate feedback from previous PRs
- [x] update screenshots - all of them right now are for reviewing purpose only. will update once the overall flow is reviewed.
- [ ] ~add tests for code snippets~ will add in a separate follow-up PR

## How I Tested These Changes
@yuhan yuhan deleted the 10-09-_docs_pipes_-_Subprocess_page branch October 12, 2023 15:45
yuhan added a commit that referenced this pull request Oct 12, 2023
Instead of the previous PRs cross-referencing each other, we make them part 1 and part 2, with a note in part 1 to navigate the stakeholder personas (who have dagster assets set up by others) to jump directly to part 2. It also splits out a standalone "Reference" page to show dagster code and external code side-by-side for advanced pipes + subprocess usage.

Concretely, this PR rearranges the content of #17092 and #17096 into:

<img width="200" alt="Screenshot 2023-10-09 at 4 48 17 PM" src="https://github.com/dagster-io/dagster/assets/4531914/3371e6b1-7ddc-4e67-b98b-2ec599fadca3">

Preview Links:
- [Dagster Pipes + subprocess](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess)
  * [Tutorial part 1: Define a Dagster asset](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/create-subprocess-asset)
  * [Tutorial part 2: Modify external code](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/modify-external-code)
  * [Reference](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/reference) - this includes example code for both dagster side and external side listed side by side in tabs

TODOs
- [x] incorporate feedback from previous PRs
- [x] update screenshots - all of them right now are for reviewing purpose only. will update once the overall flow is reviewed.
- [ ] ~add tests for code snippets~ will add in a separate follow-up PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants