Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/n subset refactor] Make TimeWindowPartitionsDefinition serializable #17660

Merged
merged 5 commits into from
Nov 14, 2023

Conversation

clairelin135
Copy link
Contributor

@clairelin135 clairelin135 commented Nov 2, 2023

This PR supports serializing TimeWindowPartitionsDefinition objects.

whitelist_for_serdes enforces a constraint where the arguments to __new__ must match the ordering of fields in the named tuple. This constraint is required to pack and unpack objects properly during pickling.

Unfortunately this ordering doesn't match in TimeWindowPartitionsDefinition, and we can't reorder the fields without breaking users :( This PR loosens the ordering restriction via a new is_pickleable arg, and raises an error when attempting to pickle a TimeWindowPartitionsDefinition.

timezone: Optional[str] = None,
end: Union[datetime, str, None] = None,
Copy link
Contributor Author

@clairelin135 clairelin135 Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't actually re-order args here because we don't require kwargs, so this change would break users.

for now, leaving this as is for a proof of concept

@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from 8a95de4 to 828e5a4 Compare November 3, 2023 21:44
@clairelin135 clairelin135 changed the base branch from master to 11-02-split_time_window_partitions_subset_implementation November 3, 2023 21:44
@clairelin135 clairelin135 force-pushed the 11-02-split_time_window_partitions_subset_implementation branch from cc35ffe to 27d3b83 Compare November 3, 2023 21:52
@clairelin135 clairelin135 changed the title [1 partitions subset ser/deser] Add whitelist_for_serdes to TimeWindowPartitionsDefinition [2/n subset refactor] Add whitelist_for_serdes to TimeWindowPartitionsDefinition Nov 3, 2023
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from 828e5a4 to 3e7006e Compare November 3, 2023 22:22
@clairelin135 clairelin135 changed the title [2/n subset refactor] Add whitelist_for_serdes to TimeWindowPartitionsDefinition [2/n subset refactor] Make TimeWindowPartitionsDefinition serializable Nov 6, 2023
@clairelin135 clairelin135 marked this pull request as ready for review November 6, 2023 21:06
Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we are (unfortunately) somewhat locked in to the ordering of the fields on the __new__ method, but we should have free reign on reordering the fields of the underlying NamedTuple to match the __new__ method rather than vice-versa, right?

@clairelin135
Copy link
Contributor Author

clairelin135 commented Nov 7, 2023

@OwenKephart unfortunately I don't think adjusting the order of the named tuple fields will fix -- the cron_schedule arg in __new__ is the tenth (?) arg and the sixth (and last) field in the named tuple, which breaks the ordering constraint

I think the only way to fix this would be to change the placement of cron_schedule in __new__

@sryza
Copy link
Contributor

sryza commented Nov 7, 2023

Unfortunately whitelist_for_serdes has a constraint where the arguments to new must match the ordering of fields in the named tuple

Would it be possible to remove that constraint, e.g. by adding an option to disable it? I think life will be way simpler if we can avoid introducing a new SerializableTimeWindowPartitionsDefinition class.

@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from 338e8b8 to d264af4 Compare November 7, 2023 22:40
Copy link

github-actions bot commented Nov 7, 2023

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-jkh47wkhs-elementl.vercel.app
https://claire-tw-partitions-def-whitelist-for-serdes.core-storybook.dagster-docs.io

Built with commit d264af4.
This pull request is being automatically deployed with vercel-action

Copy link

github-actions bot commented Nov 7, 2023

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-p5qp0spqx-elementl.vercel.app
https://claire-tw-partitions-def-whitelist-for-serdes.components-storybook.dagster-docs.io

Built with commit d264af4.
This pull request is being automatically deployed with vercel-action

@clairelin135 clairelin135 force-pushed the 11-02-split_time_window_partitions_subset_implementation branch from d024b99 to e9dc40f Compare November 7, 2023 22:45
@clairelin135 clairelin135 removed the request for review from erinkcochran87 November 7, 2023 22:46
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from d264af4 to f320752 Compare November 7, 2023 22:47
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from f320752 to 0b5fa15 Compare November 9, 2023 18:05
@clairelin135 clairelin135 force-pushed the 11-02-split_time_window_partitions_subset_implementation branch from 92c25eb to af983ea Compare November 9, 2023 18:16
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from 0b5fa15 to d629763 Compare November 9, 2023 18:16
@clairelin135 clairelin135 force-pushed the 11-02-split_time_window_partitions_subset_implementation branch from af983ea to ecc6f42 Compare November 10, 2023 00:23
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from d629763 to 322db16 Compare November 10, 2023 00:24
Base automatically changed from 11-02-split_time_window_partitions_subset_implementation to master November 10, 2023 20:27
@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch 2 times, most recently from 3dad787 to f0de555 Compare November 10, 2023 21:12
if require_args_to_match_field_ordering:
value_param = value_params[index]
if value_param.name != field:
error_msg = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alangenfeld do you know if we can remove this check altogether? Looks like when rebuild named tuples during deserialization we use kwargs, so this might not be needed anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heres the history i could dig up, unfortunate we are locked out of our phabricator content now
52be7e8
#2372

I think PipelineRun was getting pickled when passed around in multiprocessing contexts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • might be better to rename the toggling bool to something along the lines of cant_pickle to more clearly advertise what opting out of the protection represents
  • probably want to put pickle round tripping under test to demonstrate that it is known to fail in whatever way it does (if possible make it fail clearly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the context here.

I renamed the param to is_pickleable. I also overrode a __getstate__ method on TimeWindowPartitionsDefinition that enables raising an error when attempting to pickle:

https://docs.python.org/3/library/pickle.html#handling-stateful-objects

@clairelin135 clairelin135 force-pushed the claire/tw-partitions-def-whitelist-for-serdes branch from 2cdad9c to e97b5b2 Compare November 13, 2023 20:54
Copy link

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-rjb9rnf2r-elementl.vercel.app
https://claire-tw-partitions-def-whitelist-for-serdes.dagster.dagster-docs.io

Direct link to changed pages:

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool by me

@clairelin135 clairelin135 merged commit 269bcd7 into master Nov 14, 2023
1 check passed
@clairelin135 clairelin135 deleted the claire/tw-partitions-def-whitelist-for-serdes branch November 14, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants