-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle resource overrides in KF plugins #4467
Conversation
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
Signed-off-by: Jeev B <[email protected]>
55063aa
to
cfc2f24
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4467 +/- ##
==========================================
- Coverage 59.63% 59.52% -0.12%
==========================================
Files 638 636 -2
Lines 53995 53838 -157
==========================================
- Hits 32201 32045 -156
- Misses 19262 19265 +3
+ Partials 2532 2528 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Jeev B <[email protected]>
return nil, err | ||
} | ||
launcherReplica.RestartPolicy = common.ParseRestartPolicy(launcherReplicaSpec.GetRestartPolicy()) | ||
launcherReplicaSpec, err = common.ToReplicaSpecWithOverrides(ctx, taskCtx, kfMPITaskExtraArgs.GetLauncherReplicas(), kubeflowv1.MPIJobDefaultContainerName, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to confirm if launcher pods should be treated as special and be non-interruptible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clear evidence to suggest one way or the other. This pod is responsible for launching the jobs on the MPI workers (somewhat akin to Ray head nodes), so will give it special treatment for now.
taskCtxOptions := []flytek8s.PluginTaskExecutionContextOption{} | ||
// Master should always run as non-interruptible | ||
if isMaster { | ||
taskCtxOptions = append(taskCtxOptions, flytek8s.WithInterruptible(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that "master" pods will be tagged as non-interruptible. Currently this is set for PyTorch masters and MPI launchers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you!
Re: whether kfoperators master pods should be non-interruptible, why do we have do consider those as special? Is it just for consistency with the other k8s plugins (dask and spark have their driver/master pods as non-interruptible?
If it would be detrimental to lose the master, we should run them as non-interruptible. Same logic as dask and spark, yes. |
Signed-off-by: Jeev B <[email protected]>
3253966
to
1d29e02
Compare
return nil, flyteerr.Errorf(flyteerr.BadTaskSpecification, "Unable to create replica spec: [%v]", err.Error()) | ||
} | ||
launcherReplicaSpec = replicaSpec.DeepCopy() | ||
// TODO (jeev): Is this even a valid configuration. Can there be more than 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintains parity for now, but will need to follow up in a subsequent investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. would feel more confident with a few more eyes here. @fg91 @yubofredwang mind taking a quick look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with the change that a pytorchjob master replica should always be non-interruptible. The master replica doesn't start the worker replicas or restart them in case they are preempted. If one of the worker replicas is preempted, all other pods including the master replica have to stop/be restarted. So there is no value in preventing the master replica from being preempted. Or am I overlooking something here?
I think differentiating between master and worker replicas doesn't make that much sense for pytorchjobs in the first place and in the newer elastic pytorchjobs, there are no more master replicas at all.
Could you please revert this change (unless I'm overlooking something)?
I appreciate that you tagged me @hamersaw @jeevb since we rely on pytorch jobs a lot :)
I ran a flytepropeller with your changes in our staging cluster and apart from the non-interruptible master replica everything works as expected.
However, for pytorchjobs everything worked as expected also before. Do I understand it correctly that the large diff in pytorch.go
comes from the fact that now the overriding of resources and resource/interruptible tolerations happens more centrally instead of being handled separately in every single plugin?
Signed-off-by: Jeev B <[email protected]>
I went ahead and reverted forcing non-interruptibility on KF plugin masters. This keeps parity with the current implementation for now. @fg91: The big change here was to apply resource overrides from plugin-specific config before |
Parity also for MPI? Asking because I know only for pytorch that the master shouldn't always be non-interruptible - I don't know for MPI ..
Sounds great! Out of curiosity, does this mean that resource tolerations are automatically handled correctly if another plugin is added because this is handled before the plugin builds the resource? |
Yes keeping parity for MPI as well for now, since no one has been able to provide a convincing argument one way or the other. I don't know MPI well enough. Let me know if you have thoughts @fg91.
This is still handled when the plugin is building the resource. The overrides in this case are derived from parsing the plugin configuration. These changes basically give plugins a simple mechanism to inject overrides and benefit from the helpers in |
I don't know MPI well enough either unfortunately 🤷 But I'm sure about pytorch jobs ... and as long as we keep parity here, I think this is good to merge.
Nice. I always disliked that the plugins had to reinvent the wheel here instead of using shared helpers. |
Tracking issue
Fixes: #4422
Describe your changes
This PR:
TaskExecutionContext
before passing toToK8sPodSpec
where the base pod spec is constructed, and hydrated with defaults and additional configuration (e.g. affinity, tolerations, etc.).dask
andspark
plugins to use this new helper for overriding interruptibility of the driver podsmpi
,pytorch
andtensorflow
) to correctly handle resource overrides specified in the respective task configs. This allows for correctly applying resource-specific tolerations (e.g. GPU) to the pod spec.Config:
Task decorator:
Launcher tolerations:
Worker tolerations:
Check all the applicable boxes
Setup Process
Screenshots
Note to reviewers
Related PRs