Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fds add num_partitions property to partitioners #3095

Merged
merged 5 commits into from
Mar 12, 2024

Conversation

adam-narozniak
Copy link
Member

@adam-narozniak adam-narozniak commented Mar 11, 2024

Issue

The number of partitions is a piece of information that is associated with a partitioner. However, there’s no way to access the number of partitions in the current partitioner abstractions. It makes it impossible to implement the creation of all partitions to use concatenate_divisions and create plots.

Description

NaturalIdPartitioner does not have the number of partitions (no need for that since it’s equal to the number of unique ids from the column specified by a user).

Other partitioners have either num_partitions or partition_sizes. This makes the specification quite diverse.

Related issues/PRs

To be created (concatenate_divisions and plotting). This is a prerequisite for these PRs.

Proposal

Add an abstract property num_partitions to the Partitioner. Expect users to trigger partitioning (+pior checks on the correctness) to ensure the correctness of this num_partitions.

Explanation

This is the most flexible solution I see right now. It doesn't require additional attributes in each partitioner and, due to our lazy partitioning, enables it to be triggered manually.
The part of the correctness check has to happen prior to partitioning (and can't be done e.g. in init) because it's only possible when the dataset is assigned.

Changelog entry

@adam-narozniak adam-narozniak changed the title Fds add num partitions Fds add num_partitions property to partitioners Mar 11, 2024
@adam-narozniak adam-narozniak enabled auto-merge (squash) March 12, 2024 21:22
@adam-narozniak adam-narozniak merged commit 1057001 into main Mar 12, 2024
34 checks passed
@adam-narozniak adam-narozniak deleted the fds-add-num-partitions branch March 12, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants