Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parse exit code when Outputs is not populated #13228

Closed
wants to merge 1 commit into from

Conversation

tooptoop4
Copy link
Contributor

@agilgur5
Copy link
Member

Not sure why you filed a duplicate of #13180... It is once again missing DCO as well

@agilgur5 agilgur5 added solution/workaround There's a workaround, might not be great, but exists problem/more information needed Not enough information has been provide to diagnose this issue. labels Jun 20, 2024
@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Jun 25, 2024

so after running this for a while it seems to never extract an exit code, but the extra debug logs are handy. whenever exit is -1, which is most of the time, message is empty 98% of the time, or the other 2% just something like "lastRetry.duration:57.383139809 lastRetry.exitCode:-1 lastRetry.message:child 'redact-1335299960' failed lastRetry.status:Failed retries:1 seems there is some broader issue of why it is not finding the right message/exitcode

templateDefaults:
  retryStrategy:
    limit: 1
    retryPolicy: "Always"
    expression: 'lastRetry.status == "Error" or (lastRetry.status == "Failed" and (asInt(lastRetry.exitCode) in [255,137,143] or (asInt(lastRetry.exitCode) in [-1] and int(float(lastRetry.duration)) < 301)))'
    backoff:
      duration: "75"
      factor: 1
      maxDuration: "300"

here is one that failed first time with exit 1 and retried (which is bad), this wf uses steps

time="2024-06-25T05:18:36.694Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf(0)"
time="2024-06-25T05:18:36.696Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf(0)[0].mystep(0)"
time="2024-06-25T05:19:22.924Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf(0)"
time="2024-06-25T05:19:22.927Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:47 lastRetry.exitCode:1 lastRetry.message:Error (exit code 1) lastRetry.status:Failed retries:0] for node: wf(0)[0].mystep(0)"
time="2024-06-25T05:19:22.929Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:56.928275945 lastRetry.exitCode:-1 lastRetry.message:child 'wf-845098651' failed lastRetry.status:Failed retries:0] for node: wf(0)"
time="2024-06-25T05:20:37.934Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:56 lastRetry.exitCode:-1 lastRetry.message:child 'wf-845098651' failed lastRetry.status:Failed retries:0] for node: wf(0)"
time="2024-06-25T05:20:47.994Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:1] for node: wf(1)"
time="2024-06-25T05:20:47.996Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf(1)[0].mystep(0)"
time="2024-06-25T05:21:34.380Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:1] for node: wf(1)"
time="2024-06-25T05:21:34.382Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:47 lastRetry.exitCode:1 lastRetry.message:Error (exit code 1) lastRetry.status:Failed retries:0] for node: wf(1)[0].mystep(0)"
time="2024-06-25T05:21:34.383Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:57.383139809 lastRetry.exitCode:-1 lastRetry.message:child 'wf-1335299960' failed lastRetry.status:Failed retries:1] for node: wf(1)"
time="2024-06-25T05:21:44.433Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf.onExit(0)"
time="2024-06-25T05:21:44.436Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf.onExit(0)[0].notifyError(0)"
time="2024-06-25T05:21:54.479Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: wf.onExit(0)"
time="2024-06-25T05:21:54.480Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:10 lastRetry.exitCode:0 lastRetry.message: lastRetry.status:Succeeded retries:0] for node: wf.onExit(0)[0].notifyError(0)"
time="2024-06-25T05:21:54.481Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:20.480856919 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Succeeded retries:0] for node: wf.onExit(0)"

here is one that failed first time with exit 1 and did not try to retry (which is good), this wf does not use steps

time="2024-06-25T05:18:29.838Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: otherwf(0)"
time="2024-06-25T05:18:42.711Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:11 lastRetry.exitCode:1 lastRetry.message:Error (exit code 1) lastRetry.status:Failed retries:0] for node: otherwf(0)"
time="2024-06-25T05:18:52.843Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: otherwf.onExit(0)"
time="2024-06-25T05:18:52.845Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:0 lastRetry.message: lastRetry.status:Running retries:0] for node: otherwf.onExit(0)[0].notifyError(0)"
time="2024-06-25T05:19:03.959Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:0 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Running retries:0] for node: otherwf.onExit(0)"
time="2024-06-25T05:19:03.961Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:10 lastRetry.exitCode:0 lastRetry.message: lastRetry.status:Succeeded retries:0] for node: otherwf.onExit(0)[0].notifyError(0)"
time="2024-06-25T05:19:03.963Z" level=debug msg="retryStrategy localScope: map[lastRetry.duration:21.961948212 lastRetry.exitCode:-1 lastRetry.message: lastRetry.status:Succeeded retries:0] for node: otherwf.onExit(0)"


ok, now i'm certain the bug is related to how it handles self referenced templates in steps. ie

      templates:
        - name: flow
          steps:
            - - name: inlinebelow
                template: inlinebelow
        - name: inlinebelow
        ....................stuff below

when i removed those steps it started getting exit code i expect
maybe related to #13230

any idea @jswxstw @tczhao ?

@jswxstw
Copy link
Member

jswxstw commented Jun 26, 2024

I don't understand why we need to extract the exit code from the message.
Under what circumstances would the exit code be nil but appear in the message?

@tooptoop4
Copy link
Contributor Author

from above tests it is useless, but these debug logs did help uncover steps referencing a template defined in the same yaml are a reliable reproduction of exitcode not being set, maybe i should make issue for that

@tooptoop4 tooptoop4 marked this pull request as draft June 27, 2024 10:03
@agilgur5 agilgur5 added the area/retryStrategy Template-level retryStrategy label Jul 2, 2024
@tooptoop4
Copy link
Contributor Author

this PR did not solve any issue but it helped to investigate exactly what the root cause is, which is now being tracked in #13297

@tooptoop4 tooptoop4 closed this Jul 3, 2024
@agilgur5 agilgur5 added solution/invalid This is incorrect. Also can be used for spam solution/superseded This PR or issue has been superseded by another one (slightly different from a duplicate) and removed problem/more information needed Not enough information has been provide to diagnose this issue. labels Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/retryStrategy Template-level retryStrategy solution/invalid This is incorrect. Also can be used for spam solution/superseded This PR or issue has been superseded by another one (slightly different from a duplicate) solution/workaround There's a workaround, might not be great, but exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants