Provide alternate to downwardAPI for podmetadata volume/data #4616

ryedin · 2020-11-30T16:04:49Z

Summary

Currently, the use of the downwardAPI to project the podmetadata volume blocks the use of argo workflows on nodes that don't have support for downwardAPI projections. This is currently the case for Azure's Virtual Nodes feature for AKS. I have worked through all the other issues (ie using the k8sapi executor to get around the docker-sock problem), but this one is currently blocking.

It's obvious why downwardAPI is being used (super convenient), but am wondering if it would be feasible to get this data propagated via some other mechanism when that api isn't available on a particular node/nodepool.

If feasible, would then expose this feature via config (e.g. disableDownwardAPI, assumed false if not present in order to be backward compatible)

EDIT: of equal import: virtual nodes in AKS also do not support initContainers. I know these limitations would ideally be solved at the AKS/virtual-kubelet level (and I've made a ticket over there as well), but am hoping argo could potentially be adapted faster than MSFT is likely to move on any of this. I can probably take a stab at a PR if someone with more knowledge than I have could propose a viable alternative.

The only thing I can think of is to have the init/wait/main containers make actual API calls to get the data needed, when needed. But not sure if that makes sense.

Use Cases

When deploying workflows to so-called "serverless" node solutions that don't support downwardAPI projections (i.e. AKS virtual nodes)

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

alexec · 2020-11-30T16:45:25Z

Some of the code related to podmeta can be found in executor.go:

func (we *WorkflowExecutor) monitorAnnotations(ctx context.Context) <-chan struct{} {

I believe this is used to see if the pod has been annotated with "execution control" and the main container must therefore be terminated.

This could be replaced by a pod get on the deadline timer.

Would you like to submit a PR?

alexec · 2020-11-30T16:46:00Z

@simster7 you know this code well - anything to add?

alexec · 2020-11-30T17:28:35Z

After a bit of an investigation - this is non-trivial. Each workflow pod running using the k8sapi executor creates a connection to the k8sapi to watch for changes to the pod. This means if a workflow has N pods, it has N connections. This solution does not scale and as a result, will not work for the pns, docker or kubelet executors.

We could move the responsibility of watching for annotation changes to the executor (not unreasonable), and they could then choose which method to use (k8sapi is already watching).

However, I do wonder if there is a much simpler solution. If the downward API is not available, then we just log the error and do not monitor the execution control annotation.

ryedin · 2020-11-30T18:42:48Z

We could move the responsibility of watching for annotation changes to the executor (not unreasonable), and they could then choose which method to use (k8sapi is already watching).

coming in with low context, this does seem like potentially the optimal solution. e.g. with k8sapi set as the executor, and watching, the podmetadata volume is just moot?

However, I do wonder if there is a much simpler solution. If the downward API is not available, then we just log the error and do not monitor the execution control annotation.

I am slightly confused by this, in that I don't understand the relationship to the monitoring of the execution control annotation, and the workflow actually working. I thought the annotations were somehow critical to the correct execution of the flow. Also, the error received is a low level k8s error about not being able to schedule work to any matching nodes because of the unsupported volume type, so we just somehow have to not attempt mounting that volume... unless I misunderstood what you meant by "just log the error"

alexec · 2020-11-30T19:22:05Z

The annotations are used for terminating the pod - i.e. argo stop. If you can go without that capability - there is a simpler solution.

ryedin · 2020-11-30T19:55:41Z

ah ok, I understand. I do occasionally manually re-submit a flow, and then potentially delete the flow (from the UI), but those cases are typically when I'm debugging flows / configs.

We can honestly live without those functions for the stable production flows, if it means we can run on instantly scalable infra

alexec · 2020-11-30T19:58:03Z

We've just spoken about the lack of init containers. We must have a way to delay the start of the main container, and this is init containers.

Can you please link to the Azure issue?

ryedin · 2020-11-30T20:18:29Z

The azure docs on virtual nodes (quick linked to limitations section): https://docs.microsoft.com/en-us/azure/aks/virtual-nodes#known-limitations

issue in azure-aci project I just created today to ask about potential support for downwardAPI volumes: virtual-kubelet/azure-aci#91 (no movement yet)

And ah... I was not thinking about the init container. Interestingly, isn't it actually the wait container that delays the main? In my flows I only actually see a couple workflow pods that have an init container show up, and that seems to only be there in script based task definitions that have outputs (not sure if/how that's related but is true for our flows). The wait container isn't an initContainer, and the vast majority of our task pods come up with only wait and main containers.

(I'm likely just confused by the name wait as far as the purpose of that container). I understand init as a true initContainer is likely needed for script tasks to mount the supplied script in as a file to run against the command.

Either way, yeah I don't know what to do about the actual init container then.

ryedin · 2020-11-30T20:32:50Z

For our needs, I can re-implement those script tasks as container/command tasks and completely get rid of any task pods that create the init container (the rest of the pods are container/command tasks).

ryedin · 2020-12-01T16:41:59Z

So, I know we don't have a conceptual, generic solution for the init container beyond my willingness to re-implement our flows to avoid it completely, but as for the other main blocker, you mentioned there being a possible simpler solution (eg. having the optionality to not monitor the annotations).

I'm still slightly confused just because when I look at the annotations in the cluster, they are extensive and include input params and whatnot. I'm not following how all that data being added to the annotations is just to terminate pods prematurely, but I'll obviously take your word for it :)

But at any rate, how "simple" is it actually to do that? I don't have a go dev setup currently, and am a little rusty, but am motivated to try to find a path to get our flows going on virtual nodes.

alexec · 2022-02-07T21:39:36Z

Fixed.

ryedin added the type/feature Feature request label Nov 30, 2020

alexec added the help wanted label Nov 30, 2020

alexec removed the help wanted label Nov 30, 2020

chazapis mentioned this issue May 19, 2021

feat: Make DownwardAPI optional #5946

Closed

1 task

chazapis mentioned this issue May 26, 2021

refactor: Remove the need for pod annotations to be mounted as a volume #6022

Merged

1 task

alexec closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide alternate to downwardAPI for podmetadata volume/data #4616

Provide alternate to downwardAPI for podmetadata volume/data #4616

ryedin commented Nov 30, 2020 •

edited

Loading

alexec commented Nov 30, 2020

alexec commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020 •

edited

Loading

ryedin commented Nov 30, 2020

ryedin commented Dec 1, 2020 •

edited

Loading

alexec commented Feb 7, 2022

Provide alternate to downwardAPI for podmetadata volume/data #4616

Provide alternate to downwardAPI for podmetadata volume/data #4616

Comments

ryedin commented Nov 30, 2020 • edited Loading

Summary

Use Cases

alexec commented Nov 30, 2020

alexec commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020

alexec commented Nov 30, 2020

ryedin commented Nov 30, 2020 • edited Loading

ryedin commented Nov 30, 2020

ryedin commented Dec 1, 2020 • edited Loading

alexec commented Feb 7, 2022

ryedin commented Nov 30, 2020 •

edited

Loading

ryedin commented Nov 30, 2020 •

edited

Loading

ryedin commented Dec 1, 2020 •

edited

Loading