-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7368: Add a new environment variable for using dynamic partitioning #7369
Conversation
…amic partitioning Signed-off-by: Kirill Suvorov <[email protected]>
7ba554f
to
3de5359
Compare
the combined tasks carries more overhead than assigning them separately. | ||
|
||
Unfortunately, the use of Dynamic-partitioning depends on various factors such as data size, number of CPUs, operations performed, | ||
and it is up to the user to determine whether Dynamic-partitioning will give a boost in his case or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to look for a heuristic that can switch implementations automatically, then we could add a few words about this (and a link to the issue).
Co-authored-by: Anatoly Myachev <[email protected]>
@@ -675,7 +676,7 @@ def map_partitions( | |||
NumPy array | |||
An array of partitions | |||
""" | |||
if np.prod(partitions.shape) <= 1.5 * CpuCount.get(): | |||
if not DynamicPartitioning.get(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to keep the previous default behavior.
if not DynamicPartitioning.get(): | |
if np.prod(partitions.shape) <= 1.5 * CpuCount.get() and not DynamicPartitioning.get(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should do that? Shouldn't the user be given more freedom to decide when to activate this option?
The user can activate this locally, only for the required operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user now has the ability to force the use of another code branch at his own choice, this is already more flexibility than before. And since this condition worked quite well before and considering that slowdowns are possible when using this new variable, I would replace the default behavior more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused with similar code in rebalance_partitions
function. Ok, leave it as is.
The use of Dynamic-partitioning depends on various factors such as data size, number of CPUs, operations performed,
and it is up to the user to determine whether Dynamic-partitioning will give a boost in his case or not.
Performance results for
abs
:32 CPUS
112 CPUS
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date