Experiment step lifecycle is not respecting defined behaviour around requiredForCompletion #4016
Open
2 tasks done
Labels
bug
Something isn't working
Looking at https://argoproj.github.io/argo-rollouts/features/experiment/#experiment-lifecycle
Want to check if this is a bug or if I'm misunderstanding the described behaviour
Given this setup:
And the analysis run metric in the experiment step being set to count 20, interval 10s - should be 200 seconds.
What i am observing is the experiment step doesn't wait for the analysis run to finish before tearing down the experiment pod. Yet I can observe the analysis run still running. So the experiment pod itself is torn down (with some delay - once I pass the duration set in the experiment, I can see delay
│ ├──⧉ basic-rollout-exp-steps-765fc64d58-17-0-canary-exp ReplicaSet ✔ Healthy 61s delay:29s
show up on the output from
kubectl-argo-rollouts get rollout basic-rollout-exp-steps
after which time it gets scaled down.My expectation is the experiment pod is not torn down - respecting this behaviour:
If one or more of the referenced AnalysisTemplates is marked with requiredForCompletion: true, the Experiment will not complete until those AnalysisRuns have completed, even if it exceeds the Experiment duration.
Another behaviour I am observing is when
requiredForCompletion: true
is set on an analysis run for an experiment step, and the analysis run completes BEFORE the duration set for the experiment, then the experiment will end prematurely.So if the experiment duration is 10 minutes, but I have a k8s job analysis run step for the experiment which takes 1 minute, and requiredForCompletion: true, then the exeriment ends in 1 min, not 10 as expected.
rollouts 1.7.2
Checklist:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: