Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide alternate to downwardAPI for podmetadata volume/data #4616

Closed
ryedin opened this issue Nov 30, 2020 · 11 comments
Closed

Provide alternate to downwardAPI for podmetadata volume/data #4616

ryedin opened this issue Nov 30, 2020 · 11 comments
Labels
type/feature Feature request

Comments

@ryedin
Copy link

ryedin commented Nov 30, 2020

Summary

Currently, the use of the downwardAPI to project the podmetadata volume blocks the use of argo workflows on nodes that don't have support for downwardAPI projections. This is currently the case for Azure's Virtual Nodes feature for AKS. I have worked through all the other issues (ie using the k8sapi executor to get around the docker-sock problem), but this one is currently blocking.

It's obvious why downwardAPI is being used (super convenient), but am wondering if it would be feasible to get this data propagated via some other mechanism when that api isn't available on a particular node/nodepool.

If feasible, would then expose this feature via config (e.g. disableDownwardAPI, assumed false if not present in order to be backward compatible)

EDIT: of equal import: virtual nodes in AKS also do not support initContainers. I know these limitations would ideally be solved at the AKS/virtual-kubelet level (and I've made a ticket over there as well), but am hoping argo could potentially be adapted faster than MSFT is likely to move on any of this. I can probably take a stab at a PR if someone with more knowledge than I have could propose a viable alternative.

The only thing I can think of is to have the init/wait/main containers make actual API calls to get the data needed, when needed. But not sure if that makes sense.

Use Cases

When deploying workflows to so-called "serverless" node solutions that don't support downwardAPI projections (i.e. AKS virtual nodes)


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@ryedin ryedin added the type/feature Feature request label Nov 30, 2020
@alexec
Copy link
Contributor

alexec commented Nov 30, 2020

Some of the code related to podmeta can be found in executor.go:

func (we *WorkflowExecutor) monitorAnnotations(ctx context.Context) <-chan struct{} {

I believe this is used to see if the pod has been annotated with "execution control" and the main container must therefore be terminated.

This could be replaced by a pod get on the deadline timer.

Would you like to submit a PR?

@alexec
Copy link
Contributor

alexec commented Nov 30, 2020

@simster7 you know this code well - anything to add?

@alexec
Copy link
Contributor

alexec commented Nov 30, 2020

After a bit of an investigation - this is non-trivial. Each workflow pod running using the k8sapi executor creates a connection to the k8sapi to watch for changes to the pod. This means if a workflow has N pods, it has N connections. This solution does not scale and as a result, will not work for the pns, docker or kubelet executors.

We could move the responsibility of watching for annotation changes to the executor (not unreasonable), and they could then choose which method to use (k8sapi is already watching).

However, I do wonder if there is a much simpler solution. If the downward API is not available, then we just log the error and do not monitor the execution control annotation.

@ryedin
Copy link
Author

ryedin commented Nov 30, 2020

We could move the responsibility of watching for annotation changes to the executor (not unreasonable), and they could then choose which method to use (k8sapi is already watching).

coming in with low context, this does seem like potentially the optimal solution. e.g. with k8sapi set as the executor, and watching, the podmetadata volume is just moot?

However, I do wonder if there is a much simpler solution. If the downward API is not available, then we just log the error and do not monitor the execution control annotation.

I am slightly confused by this, in that I don't understand the relationship to the monitoring of the execution control annotation, and the workflow actually working. I thought the annotations were somehow critical to the correct execution of the flow. Also, the error received is a low level k8s error about not being able to schedule work to any matching nodes because of the unsupported volume type, so we just somehow have to not attempt mounting that volume... unless I misunderstood what you meant by "just log the error"

@alexec
Copy link
Contributor

alexec commented Nov 30, 2020

The annotations are used for terminating the pod - i.e. argo stop. If you can go without that capability - there is a simpler solution.

@ryedin
Copy link
Author

ryedin commented Nov 30, 2020

ah ok, I understand. I do occasionally manually re-submit a flow, and then potentially delete the flow (from the UI), but those cases are typically when I'm debugging flows / configs.

We can honestly live without those functions for the stable production flows, if it means we can run on instantly scalable infra

@alexec
Copy link
Contributor

alexec commented Nov 30, 2020

We've just spoken about the lack of init containers. We must have a way to delay the start of the main container, and this is init containers.

Can you please link to the Azure issue?

@ryedin
Copy link
Author

ryedin commented Nov 30, 2020

The azure docs on virtual nodes (quick linked to limitations section): https://docs.microsoft.com/en-us/azure/aks/virtual-nodes#known-limitations

issue in azure-aci project I just created today to ask about potential support for downwardAPI volumes: virtual-kubelet/azure-aci#91 (no movement yet)

And ah... I was not thinking about the init container. Interestingly, isn't it actually the wait container that delays the main? In my flows I only actually see a couple workflow pods that have an init container show up, and that seems to only be there in script based task definitions that have outputs (not sure if/how that's related but is true for our flows). The wait container isn't an initContainer, and the vast majority of our task pods come up with only wait and main containers.

(I'm likely just confused by the name wait as far as the purpose of that container). I understand init as a true initContainer is likely needed for script tasks to mount the supplied script in as a file to run against the command.

Either way, yeah I don't know what to do about the actual init container then.

@ryedin
Copy link
Author

ryedin commented Nov 30, 2020

For our needs, I can re-implement those script tasks as container/command tasks and completely get rid of any task pods that create the init container (the rest of the pods are container/command tasks).

@ryedin
Copy link
Author

ryedin commented Dec 1, 2020

So, I know we don't have a conceptual, generic solution for the init container beyond my willingness to re-implement our flows to avoid it completely, but as for the other main blocker, you mentioned there being a possible simpler solution (eg. having the optionality to not monitor the annotations).

I'm still slightly confused just because when I look at the annotations in the cluster, they are extensive and include input params and whatnot. I'm not following how all that data being added to the annotations is just to terminate pods prematurely, but I'll obviously take your word for it :)

But at any rate, how "simple" is it actually to do that? I don't have a go dev setup currently, and am a little rusty, but am motivated to try to find a path to get our flows going on virtual nodes.

@alexec
Copy link
Contributor

alexec commented Feb 7, 2022

Fixed.

@alexec alexec closed this as completed Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

No branches or pull requests

2 participants