Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and `script` templates #11159

vatine · 2023-06-01T08:21:36Z

Pre-requisites

I have double-checked my configuration
I can confirm the issues exists when I tested with :latest
I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I expected an ugrade from 3.2.6 to 3.4.8 to be (mostly) "specify new images". It also required a small amount of tweaking RBAC roles.

I did not expect it to require reconfiguring every workflow (many of our workflows use custom, private, images; using the scriptfunctionality). With the upgrade primarily being motivated by "we want the fresh ssh_known_host file" (as opposed to having to use insecureIgnoreHostKey), the amount of work needed to switch from the k8sapi to the emissary executor was completely unexpected and disappointing, as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities), but it is not obvious what the auto-generated command for a "script" will be (so the entrypoint cannot be specified neither as a command nor statically in a configuration file).

Version

v3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a (this is all private images)

Logs from the workflow controller

time="2023-06-01T07:18:47.310Z" level=warning msg="Non-transient error: failed to look-up entrypoint/cmd for image \"ELIDED\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: GET https://ELIDED: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: ELIDED"

Logs from in your workflow's wait container

n/a, the pod is never created

The text was updated successfully, but these errors were encountered:

vatine · 2023-09-14T06:41:00Z

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting. That would probably solve 98% of our incompatibility issues, and for sure the specific thing being executed is 100% within the control of the executor, so it should be relatively straight-forward.

agilgur5 · 2024-04-19T14:59:48Z

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities)

From #8345, this line should be using your existing imagePullSecrets

as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting

To clarify, this issue is entirely with script templates, right?

Based on your analysis in #12787 (comment), it seemed lke your suggested solution could be possible; we might be able to skip the lookup for script templates.

Upon further inspection though, it does seem to append the Docker ENTRYPOINT to the Command if it exists, and appends Args if they exist as well. You can actually specify both in a script template, which the docs mention as well.

So I'm not so sure this is a bug with emissary; you could specify a Command for a script template, but it sounds like in your case you don't, which is why it does an entrypoint lookup.
My guess is that in your case you might be able to workaround this with bash or exec as your command in nearly all cases.

vatine · 2024-04-19T15:22:21Z

I guess it's more a "missing feature" than a bug. Even so, with your typical script action, it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint (further experimentation lead to #12787 for fixing an issue with script permissions, once a static configuration has been made, the in-house build images have now been stable for long enough that I felt it was OK to do that).

agilgur5 · 2024-04-19T15:37:14Z

it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint

it's in args actually, as your output in #12787 (comment) showed. And the source code for that is this line that I mentioned above.

As I wrote above though, the image's ENTRYPOINT and CMD get appended (when command and args are not specified) to Emissary's execution. Effectively, Emissary is a parent process that runs your (or your image's) commands as a subprocess (I only did a deep dive into the Executor in the past month or so to understand that well enough).

As such, It needs to know the image's ENTRYPOINT in order to be able to append it.

Emissary is a bit hacky, but it's the least hacky and most secure of the executors so far, as I understand it.

tooptoop4 · 2024-09-30T20:47:02Z

maybe this could be closed as EOL

agilgur5 · 2024-10-01T14:29:40Z

It's logged against 3.4, which is still supported and is when all other executors were dropped. It was also opened over a year ago, pre-dating 3.5.0-rc1.
There also seems to be a lingering bug with script templates per #12787 (comment); I'm not really sure why the image's cmd isn't being picked up.

Since #12787 was merged though, the script issue has a workaround now, but the bug should ideally be fixed.

agilgur5 · 2024-10-01T15:15:42Z

There also seems to be a lingering bug with script templates per #12787 (comment); I'm not really sure why the image's cmd isn't being picked up.

Ok I root caused this in #12787 (comment) and there is indeed a bug, but the fix might very well be a breaking change that might not be worthwhile 😕

vatine added type/bug type/regression Regression from previous behavior (a specific type of bug) labels Jun 1, 2023

sarabala1979 added the P3 Low priority label Jun 1, 2023

This comment was marked as resolved.

Sign in to view

stale bot added the problem/stale This has not had a response in some time label Jun 18, 2023

agilgur5 added the area/executor label Sep 8, 2023

stale bot removed the problem/stale This has not had a response in some time label Sep 8, 2023

vatine mentioned this issue Mar 13, 2024

fix(executor): add executable permission to staged script #12787

Merged

agilgur5 added the area/templates/script label Mar 13, 2024

agilgur5 added type/support User support issue - likely not a bug and removed type/bug type/regression Regression from previous behavior (a specific type of bug) P3 Low priority labels Apr 19, 2024

agilgur5 self-assigned this Sep 3, 2024

agilgur5 changed the title ~~Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible.~~ Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and script templates Oct 1, 2024

agilgur5 added type/bug type/regression Regression from previous behavior (a specific type of bug) solution/workaround There's a workaround, might not be great, but exists labels Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and `script` templates #11159

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and `script` templates #11159

vatine commented Jun 1, 2023

This comment was marked as resolved.

vatine commented Sep 14, 2023

agilgur5 commented Apr 19, 2024 •

edited

Loading

vatine commented Apr 19, 2024

agilgur5 commented Apr 19, 2024 •

edited

Loading

tooptoop4 commented Sep 30, 2024

agilgur5 commented Oct 1, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and script templates #11159

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and script templates #11159

Comments

vatine commented Jun 1, 2023

Pre-requisites

What happened/what you expected to happen?

Version

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container

This comment was marked as resolved.

vatine commented Sep 14, 2023

agilgur5 commented Apr 19, 2024 • edited Loading

vatine commented Apr 19, 2024

agilgur5 commented Apr 19, 2024 • edited Loading

tooptoop4 commented Sep 30, 2024

agilgur5 commented Oct 1, 2024 • edited Loading

agilgur5 commented Oct 1, 2024

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and `script` templates #11159

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and `script` templates #11159

agilgur5 commented Apr 19, 2024 •

edited

Loading

agilgur5 commented Apr 19, 2024 •

edited

Loading

agilgur5 commented Oct 1, 2024 •

edited

Loading