-
Notifications
You must be signed in to change notification settings - Fork 53
Add support raw container in the map task #329
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
for sidecarIndex, container := range pod.Spec.Containers { | ||
if container.Name == config.GetK8sPluginConfig().CoPilot.NamePrefix+flytek8s.Sidecar { | ||
for i, arg := range pod.Spec.Containers[sidecarIndex].Args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @hamersaw should we pass env FlyteK8sArrayIndex
to copilot, and construct final output prefix in the copilot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings here. How are we passing the array index to the inputs downloader? Because in flytekit we pass the input data ref and a subtask index, IIUC it reads the full list of inputs and only uses the value at the subtask index. We need to do the same thing here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pass array index to primary container instead of downloader, and raw container task will read the value at subtask index. here is an example. flyteorg/flytekit#1547.
The problem is that in the regular map task, we construct the final output prefix in the flytekit (output_prefix + array index.), but the raw container doesn't know the output prefix, it write to a local share dir instead. uploader will read the data in the share dir and upload to s3.
Signed-off-by: Kevin Su <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #329 +/- ##
==========================================
+ Coverage 62.65% 64.05% +1.39%
==========================================
Files 146 146
Lines 12220 9929 -2291
==========================================
- Hits 7657 6360 -1297
+ Misses 3981 2985 -996
- Partials 582 584 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 120 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking very good - thanks for diving into it! My general feedback is I think we can leave the pod plugin code as is. If we can encapsulate the code for maptasks in the k8s_array package then when we implement ArrayNode
and ultimately refactor this all out it doesn't leave nasty one-off code to support legacy functionality. What are your thoughts on this? Very open for discussion.
for sidecarIndex, container := range pod.Spec.Containers { | ||
if container.Name == config.GetK8sPluginConfig().CoPilot.NamePrefix+flytek8s.Sidecar { | ||
for i, arg := range pod.Spec.Containers[sidecarIndex].Args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings here. How are we passing the array index to the inputs downloader? Because in flytekit we pass the input data ref and a subtask index, IIUC it reads the full list of inputs and only uses the value at the subtask index. We need to do the same thing here right?
go/tasks/plugins/k8s/pod/plugin.go
Outdated
// When the copilot is running, we should wait until the data is uploaded by the copilot. | ||
copilotContainerName, exists := r.GetAnnotations()[flytek8s.FlyteCopilotName] | ||
if exists { | ||
copilotContainerPhase := flytek8s.DetermineContainerPhase(copilotContainerName, pod.Status.ContainerStatuses, &info) | ||
if copilotContainerPhase.Phase() == pluginsCore.PhaseRunning && len(info.Logs) > 0 { | ||
return pluginsCore.PhaseInfoRunning(pluginsCore.DefaultPhaseVersion+1, copilotContainerPhase.Info()), nil | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So right now this is done in ContainerTasks
(not using map task) by just not setting a primaryContainerName
on the Pod
, then this code waits for the entire Pod
to complete. It seems like this is what we should do for subtasks as well.
I think the issue is that here we always add a PrimaryContainerName
to the pod annotation. Maybe it makes sense to update the code in the subtask.go
so that this annotation is only added if necessary, then we shouldn't need to add to flytek8s.FlyteCopilotName
annotation above either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is the propeller will only pass array index to primary container, so we have to set raw-container to primary. However, if we set it to primary, propeller won't wait for the uploader complete, so I added flytek8s.FlyteCopilotName
to annotation, and wait for copilot first if we find the uploader container in the pod here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so we should probably update this logic then to support ContainerTask
(ie. add array index to non-copilot containers) so that it doesn't rely on the primaryContainerName
annotation.
This means we could keep the logic in PodPlugin
so that if the primaryContainerName
annotation exists, it waits for that container to completed. If it doesn't then it waits for the Pod
to complete. It helps if this logic is simple because we have a few perf ideas to layer on top of it.
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@@ -330,6 +346,10 @@ func getTaskContainerIndex(pod *v1.Pod) (int, error) { | |||
if len(pod.Spec.Containers) == 1 { | |||
return 0, nil | |||
} | |||
// Copilot is always the second container if it is enabled. | |||
if len(pod.Spec.Containers) == 2 && pod.Spec.Containers[1].Name == config.GetK8sPluginConfig().CoPilot.NamePrefix+flytek8s.Sidecar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is reasonable to assume this. Could there a scenario where the second container was not copilot?
TL;DR
When using regular task in the map task, the entrypoint is
pyflyte-map-execute
. it does two things before running the task.However, when using raw container, the entrypoint will not be
pyflyte-map-execute
.Therefore, we should update the output prefix for copilot, and support upload collection in the copilot.
Type
Are all requirements met?
Complete description
Tracking Issue
https://flyte-org.slack.com/archives/CP2HDHKE1/p1678230956906899
Follow-up issue
NA