Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify function calls and add option for custom resources #531

Merged

Conversation

KPostOffice
Copy link
Collaborator

Issue link

What changes have been made

I changed some of the huge function calls that were getting out of hand to take a reference to the cluster object. It might be worthwhile to make these into class methods at a future date

I have also added the ability for the user to specify custom resource requirements for Ray

Verification steps

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2024
@KPostOffice KPostOffice force-pushed the rhoaieng-3753 branch 6 times, most recently from c1ee4a1 to 52ffe63 Compare May 15, 2024 20:52
@KPostOffice KPostOffice marked this pull request as ready for review May 15, 2024 20:52
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 19, 2024
@KPostOffice KPostOffice force-pushed the rhoaieng-3753 branch 2 times, most recently from 49bb0f1 to f071d20 Compare June 20, 2024 16:55
@KPostOffice KPostOffice force-pushed the rhoaieng-3753 branch 3 times, most recently from 0ced913 to 8cc3f20 Compare June 21, 2024 19:43
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 30, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2024
Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work surrounding num_worker_gpus and num_head_gpus should be removed in favour of head_extended_resource_requests and worker_extended_resource_requests

src/codeflare_sdk/cluster/config.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and Job completed successfully with gpus.

Something to note, for some reason the ray dashboard believes we are using Tesla T4 GPUs. Does anyone know why?
image

I also noticed that while the pods are initialising, the status is SUSPENDED, eventhough pods are actually coming up. Moving from SUSPENDED->READY. Should I open an issue for both of these?

src/codeflare_sdk/cluster/config.py Show resolved Hide resolved
src/codeflare_sdk/utils/generate_yaml.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2024
Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2024
Copy link
Contributor

openshift-ci bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobbins228

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2024
@KPostOffice KPostOffice dismissed ChristianZaccaria’s stale review July 9, 2024 18:01

Requested changes resolved

@openshift-merge-bot openshift-merge-bot bot merged commit 5ce0b2c into project-codeflare:main Jul 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants