-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propose alternative chunk shape algorithm #996
base: dev
Are you sure you want to change the base?
Conversation
return None | ||
|
||
|
||
def array_with_desired_product( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the intent is for this to be used to determine chunk shape, I think naming the function in a way that describes what it is used for may be more intuitive for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer it this way: separating the math from the application. What we want here is a vector that has a product that is near to a target and has minimal sum, given constraints on the vector. This function is then called in an effort to determine the shape of a chunk.
@@ -12,6 +12,92 @@ | |||
from .utils import docval, getargs, popargs, docval_macro, get_data_shape | |||
|
|||
|
|||
def find_nth_none(lst, n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function intended for external use? If we will only use this for array_with_desired_product
method, then I would suggest to make the method private.
def find_nth_none(lst, n): | |
def __find_nth_none(lst, n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, both of these should be private, though @CodyCBakerPhD 's solution makes this function unnecessary.
In general, I think this is reasonable. @CodyCBakerPhD can you also take a look. |
Another approach could be to work with the prime factorization of the residual product of the size. That would get us closer to the desired chunk size but we could end up with very uneven shapes |
Uneven shape is not necessarily bad, but I think you probably want the ratio to be informed by the overall size of the dataset. I.e., for |
Motivation
Fix #995
How to test the behavior?
Checklist
ruff
from the source directory.