Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and script templates #11159

Open
2 of 3 tasks
vatine opened this issue Jun 1, 2023 · 8 comments
Assignees
Labels
area/executor area/templates/script solution/workaround There's a workaround, might not be great, but exists type/bug type/regression Regression from previous behavior (a specific type of bug) type/support User support issue - likely not a bug

Comments

@vatine
Copy link
Contributor

vatine commented Jun 1, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I expected an ugrade from 3.2.6 to 3.4.8 to be (mostly) "specify new images". It also required a small amount of tweaking RBAC roles.

I did not expect it to require reconfiguring every workflow (many of our workflows use custom, private, images; using the scriptfunctionality). With the upgrade primarily being motivated by "we want the fresh ssh_known_host file" (as opposed to having to use insecureIgnoreHostKey), the amount of work needed to switch from the k8sapi to the emissary executor was completely unexpected and disappointing, as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities), but it is not obvious what the auto-generated command for a "script" will be (so the entrypoint cannot be specified neither as a command nor statically in a configuration file).

Version

v3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a (this is all private images)

Logs from the workflow controller

time="2023-06-01T07:18:47.310Z" level=warning msg="Non-transient error: failed to look-up entrypoint/cmd for image \"ELIDED\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: GET https://ELIDED: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: ELIDED"

Logs from in your workflow's wait container

n/a, the pod is never created
@vatine vatine added type/bug type/regression Regression from previous behavior (a specific type of bug) labels Jun 1, 2023
@sarabala1979 sarabala1979 added the P3 Low priority label Jun 1, 2023
@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Jun 18, 2023
@stale stale bot removed the problem/stale This has not had a response in some time label Sep 8, 2023
@vatine
Copy link
Contributor Author

vatine commented Sep 14, 2023

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting. That would probably solve 98% of our incompatibility issues, and for sure the specific thing being executed is 100% within the control of the executor, so it should be relatively straight-forward.

@agilgur5
Copy link
Member

agilgur5 commented Apr 19, 2024

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities)

From #8345, this line should be using your existing imagePullSecrets

as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting

To clarify, this issue is entirely with script templates, right?

Based on your analysis in #12787 (comment), it seemed lke your suggested solution could be possible; we might be able to skip the lookup for script templates.

Upon further inspection though, it does seem to append the Docker ENTRYPOINT to the Command if it exists, and appends Args if they exist as well. You can actually specify both in a script template, which the docs mention as well.

So I'm not so sure this is a bug with emissary; you could specify a Command for a script template, but it sounds like in your case you don't, which is why it does an entrypoint lookup.
My guess is that in your case you might be able to workaround this with bash or exec as your command in nearly all cases.

@agilgur5 agilgur5 added type/support User support issue - likely not a bug and removed type/bug type/regression Regression from previous behavior (a specific type of bug) P3 Low priority labels Apr 19, 2024
@vatine
Copy link
Contributor Author

vatine commented Apr 19, 2024

I guess it's more a "missing feature" than a bug. Even so, with your typical script action, it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint (further experimentation lead to #12787 for fixing an issue with script permissions, once a static configuration has been made, the in-house build images have now been stable for long enough that I felt it was OK to do that).

@agilgur5
Copy link
Member

agilgur5 commented Apr 19, 2024

it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint

it's in args actually, as your output in #12787 (comment) showed. And the source code for that is this line that I mentioned above.

As I wrote above though, the image's ENTRYPOINT and CMD get appended (when command and args are not specified) to Emissary's execution. Effectively, Emissary is a parent process that runs your (or your image's) commands as a subprocess (I only did a deep dive into the Executor in the past month or so to understand that well enough).

As such, It needs to know the image's ENTRYPOINT in order to be able to append it.

Emissary is a bit hacky, but it's the least hacky and most secure of the executors so far, as I understand it.

@agilgur5 agilgur5 self-assigned this Sep 3, 2024
@tooptoop4
Copy link
Contributor

maybe this could be closed as EOL

@agilgur5
Copy link
Member

agilgur5 commented Oct 1, 2024

It's logged against 3.4, which is still supported and is when all other executors were dropped. It was also opened over a year ago, pre-dating 3.5.0-rc1.
There also seems to be a lingering bug with script templates per #12787 (comment); I'm not really sure why the image's cmd isn't being picked up.

Since #12787 was merged though, the script issue has a workaround now, but the bug should ideally be fixed.

@agilgur5 agilgur5 changed the title Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible. Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible with private images and script templates Oct 1, 2024
@agilgur5 agilgur5 added type/bug type/regression Regression from previous behavior (a specific type of bug) solution/workaround There's a workaround, might not be great, but exists labels Oct 1, 2024
@agilgur5
Copy link
Member

agilgur5 commented Oct 1, 2024

There also seems to be a lingering bug with script templates per #12787 (comment); I'm not really sure why the image's cmd isn't being picked up.

Ok I root caused this in #12787 (comment) and there is indeed a bug, but the fix might very well be a breaking change that might not be worthwhile 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/executor area/templates/script solution/workaround There's a workaround, might not be great, but exists type/bug type/regression Regression from previous behavior (a specific type of bug) type/support User support issue - likely not a bug
Projects
None yet
Development

No branches or pull requests

4 participants