-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n subset refactor] Split TimeWindowPartitionsSubset #17684
[1/n subset refactor] Split TimeWindowPartitionsSubset #17684
Conversation
5bc3c4d
to
c9f78fc
Compare
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
a3c067a
to
cc35ffe
Compare
cc35ffe
to
27d3b83
Compare
) | ||
|
||
@property | ||
@cached_method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it takes no arguments, you can use @cached_property
from functools here
Args: | ||
dt_cron_schedule (str): A cron schedule that dt is on one of the ticks of. | ||
""" | ||
if self._included_time_windows is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would never be not None
, right?
def __init__( | ||
self, | ||
partitions_def: TimeWindowPartitionsDefinition, | ||
num_partitions: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards not providing default values for these arguments and instead requiring the caller to to think about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
self._num_partitions = ( | ||
num_partitions | ||
if num_partitions | ||
else self._num_partitions_from_time_windows(partitions_def, included_time_windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be an expensive operation. If it wasn't previously part of the constructor, I'd be nervous that adding it could cause perf issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it -- I amended this to instead just be optional, and to only calculate on demand
python_modules/dagster/dagster/_core/definitions/time_window_partitions.py
Show resolved
Hide resolved
included_time_windows, "included_time_windows", of_type=TimeWindow | ||
) | ||
|
||
def get_included_time_windows(self) -> Sequence[TimeWindow]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call this included_time_windows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah we don't need this (probably accidentally added this)
included_partition_keys=self._included_partition_keys, | ||
) | ||
|
||
def resolve(self) -> "TimeWindowPartitionsSubset": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I don't see it called anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not used in this PR, but I was planning on using it in later parts of this stack to enable serializing a TimePartitionKeyPartitionsSubset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to add it in the later part that uses it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
d024b99
to
e9dc40f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks solid. Last thing: I'm wondering if there's a way to name these classes that makes their inheritance relationship clear. E.g. if I saw BaseTimeWindowPartitionsSubset
and TimePartitionKeyPartitionsSubset
in the wild, it wouldn't be obvious that the latter is a superclass of the former.
A convention-al way of naming these would be something like:
TimeWindowPartitionsSubset
- abstract base class for subsets of aTimeWindowPartitionsDefinition
PartitionKeysTimeWindowPartitionsSubset
- aTimeWindowPartitionsSubset
that's represented by a set of partition keys.TimeWindowsTimeWindowPartitionsSubset
- aTimeWindowPartitionsSubset
that's represented by a set of time windows.
TimeWindowsTimeWindowPartitionsSubset
is a bit of a gnarly name of course, so definitely not a slam dunk.
included_partition_keys=self._included_partition_keys, | ||
) | ||
|
||
|
||
class TimeWindowPartitionsSubset(BaseTimeWindowPartitionsSubset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from class-level docstring. Maybe "A PartitionsSubset for a TimeWindowPartitionsDefinition, which internally represents the included partitions using TimeWindow".
What do you think of this:
|
@clairelin135 how would you describe each of those? |
|
We describe a The reason I'm belaboring this is that some of these are going to end up in storage, so we won't be able to easily change them. |
@sryza that makes sense. I'm hesitant about the naming of What if we kept Then we could rename |
@clairelin135 great, I like it. I vote for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment is that these classes we discussed could benefit from class-level docstring. Otherwise, this looks good to go.
92c25eb
to
af983ea
Compare
af983ea
to
ecc6f42
Compare
TimeWindowPartitionsSubset
has lots of performance improvement code that relies on representing partitions as keys, but relies on converting partition keys to time windows in order to serialize. In preparation for serializing the time window version, this PR splits the implementation:BaseTimePartitionsSubset
: a time partitions subset superclass that defines abstract methods, and contains helper functionsTimePartitionKeyPartitionsSubset
: the "partition key" representationTimeWindowPartitionsSubset
: the "time window" representationBy default, when creating partitions subsets via
partitions_def.empty_subset().with_partition_keys(...)
, aTimePartitionKeyPartitionsSubset
will be created. This class contains a method to convert to aTimeWindowPartitionsSubset
when it needs to be serialized.In future PRs, time partitions subsets can be directly deserialized as
TimeWindowPartitionsSubsets
.