Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4/n subset refactor] Add whitelist_for_serdes to DefaultPartitionsSubset #17703

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

clairelin135
Copy link
Contributor

@clairelin135 clairelin135 commented Nov 3, 2023

This PR makes the DefaultPartitionsSubset serializable by making it a NamedTuple and removing partitions_def from it (as these partitions defs cannot be serialized).

This causes a cascading set of changes:

  • PartitionsSubset methods such as get_partition_keys_in_range, get_partition_keys_not_in_subset must now accept a partitions_def arg
  • PartitionMapping methods now must accept a partitions def corresponding to a partitions subset, otherwise the partitions def is inaccessible
  • Subclassing named tuples doesn't work well since you can't override methods, so this PR removes MultiPartitionsSubset and modifies callsites to transform partition keys to MultiPartitionKeys if needed

@clairelin135 clairelin135 changed the title first stab [4/n subset refactor] Add whitelist_for_serdes to DefaultPartitionsSubset Nov 3, 2023
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from 263cd3b to c5fa082 Compare November 4, 2023 00:01
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from c4e4110 to 1b749a4 Compare November 4, 2023 00:06
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch 3 times, most recently from 1f9247e to 6ea89d3 Compare November 6, 2023 17:21
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch 2 times, most recently from 227fe1d to 9e1b34d Compare November 6, 2023 17:46
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from 6ea89d3 to d708943 Compare November 6, 2023 17:47
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 9e1b34d to 8853217 Compare November 6, 2023 17:50
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from d708943 to c0f2d6a Compare November 6, 2023 17:55
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 8853217 to 3f158f7 Compare November 6, 2023 17:55
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch 2 times, most recently from dbaca7d to a138c9f Compare November 6, 2023 21:05
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 3f158f7 to aacda59 Compare November 6, 2023 21:08
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch 4 times, most recently from 8ccd9d2 to ebb715c Compare November 8, 2023 00:15
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch 4 times, most recently from 028cd58 to 239ef13 Compare November 8, 2023 20:06
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from 8f3652f to 76e112f Compare November 10, 2023 21:18
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 662acf8 to 8afcf46 Compare November 10, 2023 22:39
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch 2 times, most recently from 7e1cb7a to f72f939 Compare November 13, 2023 21:04
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 8afcf46 to 71ba29e Compare November 13, 2023 21:12
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from f72f939 to 927eb63 Compare November 14, 2023 17:53
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 71ba29e to 15244b0 Compare November 14, 2023 17:54
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from bda8447 to fb23a93 Compare November 14, 2023 19:55
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 15244b0 to f29befd Compare November 14, 2023 19:55
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch 2 times, most recently from 1aadacf to f88b440 Compare November 15, 2023 23:51
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch 2 times, most recently from 6ea0de0 to 78a91dd Compare November 16, 2023 00:33
@clairelin135 clairelin135 requested a review from sryza November 16, 2023 01:14
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from f88b440 to cd6e384 Compare November 16, 2023 20:49
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 78a91dd to e6c5ca9 Compare November 16, 2023 20:49
@clairelin135 clairelin135 force-pushed the claire/serialize-time-window-partitions-subset branch from cd6e384 to 0e3a4fe Compare November 16, 2023 21:42
Base automatically changed from claire/serialize-time-window-partitions-subset to master November 16, 2023 22:30
continue

time window partitions subset changes

asset backfill serialization

partition mapping update

continue refactor

fix more tests

more test fixes

fix partition mapping tests

adjust test

fix more tests

add tests
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from e6c5ca9 to 66d7ba9 Compare November 16, 2023 22:36
@clairelin135 clairelin135 force-pushed the claire/default-subset-serialization branch from 66d7ba9 to 3195b7e Compare November 16, 2023 23:08
@clairelin135 clairelin135 merged commit 02c11a8 into master Nov 16, 2023
@clairelin135 clairelin135 deleted the claire/default-subset-serialization branch November 16, 2023 23:37
gibsondan added a commit that referenced this pull request Dec 2, 2023
## Summary & Motivation
#17703 removed partitions_def
from DefaultPartitionMapping, which these optimizations were relying on
existing.

We should probably remove that field from the base class now that it is
no longer reliably there and is only available on time based partition
mapping - but that is out of scope for this fix

## How I Tested These Changes
Dry run for an asset graph involving multi-partitions finishes quickly
again
gibsondan added a commit that referenced this pull request Dec 2, 2023
## Summary & Motivation
#17703 removed partitions_def
from DefaultPartitionMapping, which these optimizations were relying on
existing.

We should probably remove that field from the base class now that it is
no longer reliably there and is only available on time based partition
mapping - but that is out of scope for this fix

## How I Tested These Changes
Dry run for an asset graph involving multi-partitions finishes quickly
again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants