Replies: 1 comment 4 replies
-
It's probably that you are facing the same as described in #10634, but this is a very old Kubeflow release so I'm unsure that this is the issue. It is recommended that you try in recent versions so we can work on a reproducer, and fix it if it's a bug. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Environment
AWS EKS
kubeflow 1.0.4
Steps to reproduce
Sometimes in the pipeline kubeflow will show that the step was run, while in fact it did not run. Retry won't work in that case.
See here:
image
The step that did not run is above the failed "reports" step. (this is why the reports step was failed, it relies on the outputs of the train step).
It is also quite weird that for this failed "train" step kubeflow doesn't show "results were taken from cache" - unlike other steps that did run successfully.
The other steps that did run, show that "results taken for cache" - no idea why, these steps were fully run. What cache what used here? Can you explain how can I debug this? I don't want any cache for my runs.
This is happening randomly. Don't know exactly why this happens. I can clone the entire run and then it will succeed. (kind of random..). Where can I check why the step didn't run?
Beta Was this translation helpful? Give feedback.
All reactions