-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7202: Use custom resources for Ray #7205
Conversation
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
@YarShev do we need to adjust the procedure for determining the total available number of cores, depending on these custom resources? |
Custom resources have nothing to do with num_cpus so no need for adjustment. |
Isn’t it possible to use these resources to limit the number of nodes on which calculations will be launched? It turns out that we will be dividing into a much larger number of partitions than can be executed in parallel. |
Your thoughts pushed me to a problem in the current setup. The issue is that if the user sets resources_per_task = {}
for k, v in RayCustomResources.get():
resources_per_task[k] = v / v / num_cpus |
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
@@ -126,6 +127,7 @@ def initialize_ray( | |||
"object_store_memory": object_store_memory, | |||
"_redis_password": redis_password, | |||
"_memory": object_store_memory, | |||
"resources": RayInitCustomResources.get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly would you like to test with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not test the situation when this config is different from None
.
>>> with context(RayTaskCustomResources={"special_hardware": 0.001}): | ||
... df.<op> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that there is now an option to limit concurrency, but this only works for Ray. Let's create an issue for the rest of the engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config is not generic but only specific to Ray. I am not sure if there is a way to limit concurrency for other engines. I think if we will want to have something similar for other engines, we will open an issue and explore options if they are there. Do you still think we should create an issue now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting concurrency in context looks like a good feature for an advanced user. We can create a low priority issue, but now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
self.func, self.data, *self.args, **self.kwargs | ||
) | ||
result, length, width, ip = remote_exec_func.options( | ||
resources=RayTaskCustomResources.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just call it RayTaskResources
? (the same for RayInitResources
) Since this config is used to pass values to resources
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to be explicit here as Ray itself calls it as custom resources - https://docs.ray.io/en/latest/ray-core/scheduling/resources.html#custom-resources.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date