Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[for 1.6] Skip I/O manager when return type is Nothing #17451

Merged
merged 24 commits into from
Jan 4, 2024

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Oct 27, 2023

Summary & Motivation

Currently, pipelines we are describing as non-I/O manager pipelines still invoke the I/O manager handle_output method. This RFC proposes fully skipping the I/O manager when the dagster_type of the Output is Nothing. This means that assets with the return type annotation None and MaterializeResult (bc it gets coerced to Nothing) will not invoke the I/O manager.

Assets that return MaterializeResult but do not have a return type annotation will log a warning, but still invoke the I/O Manager.

If a user tries to load an asset with Nothing return type annotation in a downstream asset, the I/O manager will still be invoked. The reasoning:

  • Most of the Dagster provided I/O managers already do not store Nothing outputs. This means that users who are using these I/O managers already have code that is compatible with the changes in this PR
  • The I/O managers that do still store Nothing outputs are the in memory I/O manager and the dagstermill I/O manager. The in memory I/O manager is mainly useful for unit tests, which means that in production, the code would need to be using another I/O manager (likely one that doesn't store Nothing outputs)

For users who have written code that relies on storing and loading None values in order to set up dependencies with their assets/ops, they can switch to using deps for assets, and Nothing dependencies for ops.

How I Tested These Changes

@jamiedemaria
Copy link
Contributor Author

Some notes from standup today:

  • Depending on if we want to support value in MaterializeResult, we may need to modify/close this PR entirely since value will go through the I/O manager system
  • An alternate approach would be to point users to use metadata. This would require us to add restrictions around data size and format.
  • Next steps are to talk with lopp and tim about use cases when users want to pass info between assets so that we can correctly address those needs

Some related thoughts:

  • Even if we do support value in MaterializeResult that is handled by the I/O manager, we may still want to add some of the functionality in this PR. Instead of skipping I/O manager for all returned MaterializeResult we would only skip for MaterializeResults without value

@schrockn
Copy link
Member

schrockn commented Nov 1, 2023

def test_asset_none_output_non_none_input():
    @asset
    def asset1() -> None:
        return None

    @asset
    def asset2(asset1):
        assert asset1 is None

    assert materialize_to_memory([asset1, asset2]).success

How difficult would it be to get this to work? Couldn't we just unconditionally always pass None for upstream assets that are typed as Nothing?

@jamiedemaria
Copy link
Contributor Author

yeah that's definitely one solution. I think the only kind of weird case is if you had asset1 and asset2 and materialized asset2 before asset1 had ever been materialized. Do we fail in that case? I think probably not, but if you are relying on some external side affect that asset1 causes, then maybe you'd want a failure?

However, if you had the same setup with deps instead and materialized asset2 before ever materializing asset1 i think it would just materialize, so that indicates we could just always pass None

@jamiedemaria jamiedemaria force-pushed the jamie/skip-io-on-nothing branch from b024c34 to a6ed735 Compare November 15, 2023 22:22
Copy link

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-ddnph922e-elementl.vercel.app
https://jamie-skip-io-on-nothing.components-storybook.dagster-docs.io

Built with commit a6ed735.
This pull request is being automatically deployed with vercel-action

Copy link

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-azlr6rkhx-elementl.vercel.app
https://jamie-skip-io-on-nothing.core-storybook.dagster-docs.io

Built with commit a6ed735.
This pull request is being automatically deployed with vercel-action

Copy link

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-lqo5wo5kz-elementl.vercel.app
https://jamie-skip-io-on-nothing.dagster.dagster-docs.io

Direct link to changed pages:

@jamiedemaria jamiedemaria force-pushed the jamie/skip-io-on-nothing branch from a6ed735 to 387fa7d Compare November 16, 2023 14:25
@jamiedemaria
Copy link
Contributor Author

bumping this back up for review

I looked into how to provide None for inputs that had corresponding outputs typed as Nothing (as suggested here), and didn't find anything better than piping some additional information through the FromOutputStep source. Maybe I missed something more direct though? If so that'd be great.

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expecting changes from last set of comments

@jamiedemaria
Copy link
Contributor Author

turning this back into a draft for now to get it out of review queues. I'm exploring another way to do this here but it's running into similar catch-22 issues as this PR.

@jamiedemaria jamiedemaria marked this pull request as draft November 27, 2023 22:03
@jamiedemaria jamiedemaria force-pushed the jamie/skip-io-on-nothing branch from f26441e to 1894db5 Compare November 30, 2023 17:35
@jamiedemaria
Copy link
Contributor Author

popping this back into review queues. See the updated description for the current state of the PR and an issue the dbt and airbyte integrations present that we'll need to resolve. I'm going to spend some time today thinking about potential solutions, but if you have ideas that would be great

Base automatically changed from jamie/refactor-store-output to master December 4, 2023 19:15
@jamiedemaria jamiedemaria force-pushed the jamie/skip-io-on-nothing branch from dc1b570 to 67c970e Compare December 11, 2023 19:47
@jamiedemaria jamiedemaria force-pushed the jamie/skip-io-on-nothing branch from a9f7ba4 to c0c2eda Compare January 2, 2024 21:10
@jamiedemaria jamiedemaria requested a review from smackesey January 3, 2024 20:52
@jamiedemaria jamiedemaria merged commit e6e11a7 into master Jan 4, 2024
1 check passed
@jamiedemaria jamiedemaria deleted the jamie/skip-io-on-nothing branch January 4, 2024 16:07
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants