Step in UI was completed, but didn't run at all #10643

asaff1 · 2024-03-30T14:47:59Z

asaff1
Mar 30, 2024

Environment
AWS EKS
kubeflow 1.0.4

Steps to reproduce
Sometimes in the pipeline kubeflow will show that the step was run, while in fact it did not run. Retry won't work in that case.
See here:

image
The step that did not run is above the failed "reports" step. (this is why the reports step was failed, it relies on the outputs of the train step).
It is also quite weird that for this failed "train" step kubeflow doesn't show "results were taken from cache" - unlike other steps that did run successfully.
The other steps that did run, show that "results taken for cache" - no idea why, these steps were fully run. What cache what used here? Can you explain how can I debug this? I don't want any cache for my runs.
This is happening randomly. Don't know exactly why this happens. I can clone the entire run and then it will succeed. (kind of random..). Where can I check why the step didn't run?

rimolive · 2024-04-01T11:34:51Z

rimolive
Apr 1, 2024

It's probably that you are facing the same as described in #10634, but this is a very old Kubeflow release so I'm unsure that this is the issue. It is recommended that you try in recent versions so we can work on a reproducer, and fix it if it's a bug.

4 replies

asaff1 Apr 1, 2024
Author

What k8s service is responsible to schedule pipeline steps? I thought I can check the logs and see why some steps seems to be 'cached' and why the 'non cached' step did not run at all.

rimolive Apr 1, 2024

If you deployed from raw manifests then Argo Workflows is the pipeline engine.

asaff1 Apr 7, 2024
Author

@rimolive I don't see that it is the same issue. Also I use the V1 kfp API. (ContainerOp). I'm still clueless about this issue.
Upgrading kubeflow is an expensive process so I it is a bit difficult for me to do now.

rimolive Apr 24, 2024

We might need to test this same behavior in KFPv2. We are commited to do feature parity in KFPv2 compared to KFPv1, and we need to know if this still happen in v2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step in UI was completed, but didn't run at all #10643

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Step in UI was completed, but didn't run at all #10643

asaff1 Mar 30, 2024

Replies: 1 comment · 4 replies

rimolive Apr 1, 2024

asaff1 Apr 1, 2024 Author

rimolive Apr 1, 2024

asaff1 Apr 7, 2024 Author

rimolive Apr 24, 2024

asaff1
Mar 30, 2024

Replies: 1 comment 4 replies

rimolive
Apr 1, 2024

asaff1 Apr 1, 2024
Author

asaff1 Apr 7, 2024
Author