Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): retry strategy support on daemon containers, fixes #13705 #13738

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

MenD32
Copy link

@MenD32 MenD32 commented Oct 10, 2024

Addresses #13705
And #2963

Motivation

Add retryStrategy support to daemon container templates.

Some use cases require argo workflows features that aren't supported in resource template, f.e.

  • pod logs, statuses, and fails in the UI
  • passing input artifacts into the daemon container

Modifications

execution functions now consider "succeeded" daemoned nodes as failed

When a daemoned container completes execution, it is considered as failed, if it has a retry strategy it will retry.

the IP change in the node will cascade down to future executions.

Verification

e2e tests

also for manual verification I added an examples to test and see the behavior locally

  • examples/dag-daemon-retry-strategy.yaml
  • examples/steps-daemon-retry-strategy.yaml

I simulated daemon failures by deleting the daemon pod manually

@MenD32 MenD32 changed the title Feat/daemon retry strategy Feat: retry strategy support on daemon containers Oct 12, 2024
@MenD32 MenD32 changed the title Feat: retry strategy support on daemon containers feat: retry strategy support on daemon containers Oct 12, 2024
@MenD32 MenD32 marked this pull request as ready for review October 14, 2024 15:00
@MenD32 MenD32 marked this pull request as draft October 14, 2024 15:25
MenD32 added 16 commits October 18, 2024 21:07
@MenD32
Copy link
Author

MenD32 commented Oct 19, 2024

i added an E2E test and i can't get it to work on the CI (even though make test-functional works locally), any clue why?

the behavior of the controller is as if its built on main instead of on this branch

edit: NVM got it to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant