Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Add fields to specify ray head/worker resources + additional config map #429

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

guozhen-la
Copy link

@guozhen-la guozhen-la commented Aug 4, 2023

Adding these fields in support of the following features:

  • enable configuration of different resources for ray head/worker pods
  • specifying target namespace for remote ray cluster creation
  • specifying service account for remote ray cluster execution

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

Remove the 'fixes' keyword if there will be multiple PRs to fix the linked issue

fixes https://github.com/flyteorg/flyte/issues/

Follow-up issue

NA
OR
https://github.com/flyteorg/flyte/issues/

@welcome
Copy link

welcome bot commented Aug 4, 2023

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

Signed-off-by: Guozhen La <[email protected]>
Signed-off-by: Guozhen La <[email protected]>
@guozhen-la guozhen-la force-pushed the add-ray-head-worker-resources branch from 5efa9fe to bae1d8b Compare August 4, 2023 19:56
@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Merging #429 (8cfebcb) into master (0dabef7) will increase coverage by 2.55%.
The diff coverage is n/a.

❗ Current head 8cfebcb differs from pull request most recent head 12cd0c8. Consider uploading reports for the commit 12cd0c8 to get more accurate results

@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
+ Coverage   75.92%   78.48%   +2.55%     
==========================================
  Files          18       18              
  Lines        1458     1250     -208     
==========================================
- Hits         1107      981     -126     
+ Misses        294      212      -82     
  Partials       57       57              
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 18 files with indirect coverage changes

Signed-off-by: Guozhen La <[email protected]>
@guozhen-la guozhen-la force-pushed the add-ray-head-worker-resources branch 2 times, most recently from c33fb09 to 8d77c78 Compare August 8, 2023 17:22
Signed-off-by: Guozhen La <[email protected]>
@guozhen-la guozhen-la force-pushed the add-ray-head-worker-resources branch from 8d77c78 to 5561ee9 Compare August 8, 2023 17:23
Signed-off-by: Guozhen La <[email protected]>
@guozhen-la guozhen-la force-pushed the add-ray-head-worker-resources branch from 41b461a to 8af47ad Compare August 8, 2023 18:43
@@ -19,13 +21,19 @@ message RayCluster {
HeadGroupSpec head_group_spec = 1;
// WorkerGroupSpecs are the specs for the worker pods
repeated WorkerGroupSpec worker_group_spec = 2;
// Namespace used to create the ray cluster
string namespace = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more generic, could we add it to the task metadata here

// Namespace used to create the ray cluster
string namespace = 3;
// Kubernetes service account used by the ray cluster
string k8s_service_account = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that we could override the service account by using pod template. That said, we can remove k8s_service_account in the idl.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pingsutw how does it look like in the flytekit UX? we added k8s_service_account to the RayJobConfig. Would we not need it anymore and instead expect the user to override it using pod template?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may not be nice user experience if the user has to specify some attributes to override in RayJobConfig and some through a pod template

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make it general because others may want to override the service account for spark, Dask, TensorFlow tasks, etc. How about adding a namespace arg to the task decorator?

@task(rayJobConfig(...), service_account="")
def ray_task()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting just keeping the Ray-specific parameters in the RayJobConfig? Are we then going to create a parameter in the task decorator for each non-Ray configuration we want to override?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace is not a common config that people will override so we can keep it in the rayjob.

Are you suggesting just keeping the Ray-specific parameters in the RayJobConfig? Are we then going to create a parameter in the task decorator for each non-Ray configuration we want to override?

Yes, ideally. people may want to override the service account for the spark job. any other configs you want to override?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pingsutw sorry we were away for a work event last week. So if I understand correctly, you're just suggesting moving k8s_sa outside of rayjob and allow users to configure it at the decorator level?

@task(rayJobConfig(...), service_account="")
def ray_task():
    ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern around generalizability (other tasks also configuring service_account). Is this still not a better user-experience to configure everything in the task config object versus a decorator-level argument?

@task(rayJobConfig(namespace="", k8s="...",...))
def ray_task()

@task(sparkJobConfig(k8s="...",...))
def spark_task()

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants