-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs][pipes] - Modifying existing code page #17092
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
9420e15
to
3a234a1
Compare
3a234a1
to
2b54bfe
Compare
When your existing code is orchestrated by Dagster, you can run it from the Dagster UI. Navigate to the the corresponding asset page and click **Materialize** to run your code. Refer to [Viewing and materializing assets in the UI](/concepts/assets/software-defined-assets#viewing-and-materializing-assets-in-the-ui) for more info. | ||
|
||
<Image | ||
alt="Materialize Dagster assets" | ||
src="/images/guides/dagster-pipes/modifying-code-1-materialize-dagster-assets.png" | ||
width={1328} | ||
height={836} | ||
/> | ||
|
||
### Monitor code execution in Dagster UI | ||
|
||
To monitor the run, navigate to the [Run details page](/concepts/webserver/ui#run-details). This page includes timing, raw compute logs, and structured events. The bottom section shows filterable logs from the run, detailing events, their types, and specifics. There are two types of log: structured event and raw compute. | ||
|
||
#### Structured event logs | ||
|
||
Structured logs are enriched and categorized with metadata. For example, a label of what the log entry is about, links to an asset’s metadata, and the event type. This detailed structuring not only facilitates easier filtering and searching within the logs, but also populates the asset catalog and status information in the asset graph. We’ll come back to how you can log metadata from your existing code to Dagster in a little bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this go into too much detail? This seems like generalized Dagster documentation, rather than being about Pipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erinkcochran87 and i landed on this because we thought the audience (stakeholders) to this page may not be familiar with dagster, so the writing here is intentionally to be felt like a subset of generalized dagster docs
|
||
<Note> | ||
|
||
This guide assumes your code is being orchestrated by Dagster. If that’s not the case, please refer to <a href="integrating-subprocess-with-dagster-pipes">Integrating Dagster Pipes with Subprocess</a> for how to orchestrate your existing Python scripts using Dagster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this line is a little confusing because of the word "orchestrate". I think it would be clearer to say:
"This guide talks about how to modify your code to send information back to Dagster. To schedule and automate this code through Dagster you will need to follow this guide ... "
Basically, I think of it as a two step process to get Dagster to run some existing code:
- You write some Dagster code (defining the assets & adding the schedules/sensors/etc to trigger it)
- You modify the existing code
- (optonal) you modify the existing code further if you want additional metadata sent back to Dagster
To me this checklist is currently hard to understand because it is split across guides, and the cross-refs don't really make it clear you need to do both steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back end forth on the sequence of pages a lot! I think you are exactly right about the two step process. The current split is indeed confusing because the two pages are kinda depend on each other - maybe, a better approach is to structure the guides to be similar to the dbt + dagster tutorial, where we rearrange this and #17096 to be 3 separate pages:
- Part 1: Create Dagster assets (for certain stakeholders, if their code is already being orchestrated by some asset factory set up by their data platform folks, they can skip Part 1 and go directly to Part 2)
- Part 2: Modify external code
- Part 3: Send info back to Dagster - optional; advanced
@erinkcochran87 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### Materialize Dagster assets | ||
|
||
In Dagster, data assets - like a database table - are created during a process called _materialization._ During materialization, Dagster runs the function in the asset’s body to create the asset. In this case, the ‘function’ is your existing code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is confusing to me. The 'function' isn't your existing code per se... its a callout to some system to run your existing code.
|
||
## About Dagster | ||
|
||
### Materialize Dagster assets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section to me is fairly confusing and I think would make more sense if you had a sample of a dagster asset invoking the external code with pipes in it.
I get we're trying to target two different personas, but I think realistically for this this "existing external code" person to understand the guide, we need to show them at least psuedo code for how their code will be invoked.
|
||
- Imports from `dagster-pipes` | ||
|
||
- A call that connects to Dagster Pipes: <PyObject module="dagster_pipes" object="open_dagster_pipes"/> initializes the Dagster Pipes context that can be used to stream information back to Dagster. It is recommended to call this function near the entry point of a pipes process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a pipes process? Is this a new noun?
total_orders = len(orders_df) | ||
# get the Dagster Pipes context | ||
context = PipesContext.get() | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it'd be nice to show the entirety of the prior code snippet here to make it clearer what the small additional diff is... take it or leave it, but I think the ...
only saves us a few lines and in this case adds more mental work for me (what are we omitting) than it saves
docs/content/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes.mdx
Show resolved
Hide resolved
docs/content/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes.mdx
Show resolved
Hide resolved
docs/content/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes.mdx
Show resolved
Hide resolved
docs/content/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes.mdx
Show resolved
Hide resolved
docs/content/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes.mdx
Show resolved
Hide resolved
|
||
#### Report asset materialization | ||
|
||
Similar to [reporting materialization metadata within the Dagster process](/concepts/assets/software-defined-assets#reporting-materialization-metadata), you can also report asset materialization back to Dagster from the external process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting some baseline familiarity with dagster? I'm not sure how much this reference link helps if we think a lot of non dagster users will be looking at this page
close in favor of the alternative structure: #17108 |
## Summary & Motivation Instead of the previous PRs cross-referencing each other, we make them part 1 and part 2, with a note in part 1 to navigate the stakeholder personas (who have dagster assets set up by others) to jump directly to part 2. It also splits out a standalone "Reference" page to show dagster code and external code side-by-side for advanced pipes + subprocess usage. Concretely, this PR rearranges the content of #17092 and #17096 into: <img width="200" alt="Screenshot 2023-10-09 at 4 48 17 PM" src="https://github.com/dagster-io/dagster/assets/4531914/3371e6b1-7ddc-4e67-b98b-2ec599fadca3"> Preview Links: - [Dagster Pipes + subprocess](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess) * [Tutorial part 1: Define a Dagster asset](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/create-subprocess-asset) * [Tutorial part 2: Modify external code](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/modify-external-code) * [Reference](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/reference) - this includes example code for both dagster side and external side listed side by side in tabs TODOs - [x] incorporate feedback from previous PRs - [x] update screenshots - all of them right now are for reviewing purpose only. will update once the overall flow is reviewed. - [ ] ~add tests for code snippets~ will add in a separate follow-up PR ## How I Tested These Changes
Instead of the previous PRs cross-referencing each other, we make them part 1 and part 2, with a note in part 1 to navigate the stakeholder personas (who have dagster assets set up by others) to jump directly to part 2. It also splits out a standalone "Reference" page to show dagster code and external code side-by-side for advanced pipes + subprocess usage. Concretely, this PR rearranges the content of #17092 and #17096 into: <img width="200" alt="Screenshot 2023-10-09 at 4 48 17 PM" src="https://github.com/dagster-io/dagster/assets/4531914/3371e6b1-7ddc-4e67-b98b-2ec599fadca3"> Preview Links: - [Dagster Pipes + subprocess](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess) * [Tutorial part 1: Define a Dagster asset](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/create-subprocess-asset) * [Tutorial part 2: Modify external code](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/modify-external-code) * [Reference](https://10-09--docs-pipes-rfc-subprocess-section.dagster.dagster-docs.io/guides/dagster-pipes/subprocess/reference) - this includes example code for both dagster side and external side listed side by side in tabs TODOs - [x] incorporate feedback from previous PRs - [x] update screenshots - all of them right now are for reviewing purpose only. will update once the overall flow is reviewed. - [ ] ~add tests for code snippets~ will add in a separate follow-up PR
Summary & Motivation
This PR adds "Modifying existing code to work with Dagster Pipes" page which is tailored to the stakeholder persona.
build will be green once #17048 is landed
TODOs
How I Tested These Changes
https://10-09--docs-pipes---modifying-existing-code-page.dagster.dagster-docs.io/guides/dagster-pipes/modifying-existing-code-to-work-with-dagster-pipes