Remove dependency on downwardAPI volumes #4937

hWorblehat · 2022-06-03T15:20:16Z

I'm trying to use tekton on my AKS cluster and would like to run task pods on ACI virtual kubelets. Unfortunately, the ACI virtual kubelet provider doesn't currently support downwardAPI volumes which are used by tekton task pods. There is an issue in the ACI provider project (virtual-kubelet/azure-aci#91) requesting support, but I doubt there will be any progress on it any time soon, given that there's no support for a volume type in the underlying Azure service that might be adapted for this purpose (see the left-hand menu options here for supported volume types).

Removing (or providing an optional alternative to) the single use of a downwardAPI volume in tekton would allow it to be used with ACI virtual kubelets.

The following comment gives me hope that this is something that's already being planned:

pipeline/pkg/pod/entrypoint.go

Lines 80 to 82 in 9b4d3fc

    
           // TODO(#1605): Signal sidecar readiness by injecting entrypoint, 
        
           // remove dependency on Downward API. 
        
           downwardVolume = corev1.Volume{

However, the referenced issue #1605 appears to be much broader, doesn't specifically talk about the removal of downwardAPI and looks on the verge of being declared 'done'.

As I understand it, this volume is used solely for signalling that all sidecars are running before allowing the first step to proceed. I might be happy to offer a contribution here, but I don't currently understand what exactly is meant by the comment:

Signal sidecar readiness by injecting entrypoint

Perhaps @imjasonh or someone else could provide some illumination?

An alternative quick and dirty fix for the cases where sidecars aren't used would be to simply remove the wait arguments to the first entrypoint (and the volume itself) based on similar logic to the shouldAddReadyAnnotationOnPodCreate function:

pipeline/pkg/pod/entrypoint.go

Lines 121 to 124 in 9b4d3fc

    
           argsForEntrypoint = []string{ 
        
           	// First step waits for the Downward volume file. 
        
           	"-wait_file", filepath.Join(downwardMountPoint, downwardMountReadyFile), 
        
           	"-wait_file_content", // Wait for file contents, not just an empty file.

pipeline/pkg/pod/pod.go

Lines 374 to 378 in 3a305a0

    
           // shouldAddReadyAnnotationOnPodCreate returns a bool indicating whether the 
        
           // controller should add the `Ready` annotation when creating the Pod. We cannot 
        
           // add the annotation if Tekton is running in a cluster with injected sidecars 
        
           // or if the Task specifies any sidecars. 
        
           func shouldAddReadyAnnotationOnPodCreate(ctx context.Context, sidecars []v1beta1.Sidecar) bool {

The text was updated successfully, but these errors were encountered:

imjasonh · 2022-06-03T15:30:05Z

Some background on why we use the Downward API today:

When there are sidecars, we need some way for the controller to send a signal to the first step's injected entrypoint binary that it's okay to start. Today, the controller does this by observing the Ready state of any sidecars, and updating an annotation which is projected via downward API where the first step can observe the file contents change and know to start.

Today, sidecar containers don't have the entrypoint binary injected, and just start, become ready, run, and are stopped, just like regular containers in a Pod. Because they're just regular containers in a Pod.

It works today, mostly, but it's a bit brittle, and it doesn't work in scenarios like yours where the downward API is not available. 😕

The general idea behind the "Signal sidecar readiness by injecting entrypoint" comment was that, instead of having sidecars be "normal" containers, to inject the entrypoint binary into them too, like we do to order steps. This gives us more control over their lifecycle, and more visibility. We've talked about using this to more gracefully stop the sidecars, but it could also give us a way to detect if the sidecar container is ready, and inject logic to write a file in a shared volume to start the first step. If there are multiple sidecars, each would write an "I'm ready" file, and the first step would be configured to wait for all of them.

This would remove the need for the downward API, with the small added complexity of injecting the entrypoint binary on sidecars (we do it for all steps today, and it mostly works, so that's not that bad), and having the first step wait for multiple signals (I believe the code supports it today, but it isn't used in practice today AFAIK), and of needing to have some way for the injected entrypoint binary to detect the sidecar container's readiness (this is the big one IMO). The entrypoint can tell if it's started, but that's not the same as being Ready, and we'd have to think a bit about how to detect that reliably, without adding any more dependencies on brittle things like downward API.

If anybody has any ideas I'm all for them, the downward API has always been kind of a hack, and our usage of it especially. I'd be happy to meet and walk through code or brainstorm ideas if you ping me on Slack.

hWorblehat · 2022-06-03T15:43:23Z

Thanks for the very quick response!

instead of having sidecars be "normal" containers, to inject the entrypoint binary into them too, like we do to order steps

Won't this present a problem for injected sidecars (which you currently also check the 'readiness' of)?

I'd be happy to meet and walk through code or brainstorm ideas if you ping me on Slack.

Thank you for the kind offer! I may take you up on it, but it won't be for a while (I'm afraid I'm a bit busy right now). Now that you've explained it, I can see that this is a rather more involved problem than I'd hoped. 😞

Would a temporary solution of having a feature flag disable-await-sidecar-readiness be palatable? After all, the normal pod contract is that containers must do their own checking if they need to wait for their sidecars to be in a particular state.

imjasonh · 2022-06-03T16:33:08Z

Won't this present a problem for injected sidecars (which you currently also check the 'readiness' of)?

Yes! Excellent point. Let's not forget injected sidecars (the bane of my existence).

This reminds me there's a feature flag to disable behavior to account for injected sidecars, running-in-environment-with-injected-sidecars: false, in case that's useful at all.

Would a temporary solution of having a feature flag disable-await-sidecar-readiness be palatable? After all, the normal pod contract is that containers must do their own checking if they need to wait for their sidecars to be in a particular state.

This sounds fine to me personally, you'd probably want to get feedback from others (@vdemeester @jerop @afrittoli @pritidesai) in case they can think of some reason not to do it.

vdemeester · 2022-06-08T18:10:12Z

This sounds fine to me personally, you'd probably want to get feedback from others (@vdemeester @jerop @afrittoli @pritidesai) in case they can think of some reason not to do it.

That does sounds fine to me as well.

hWorblehat · 2022-06-13T08:45:11Z

I've raised #4953 to hopefully provide a temporary solution for this. Let me know what you think.

hWorblehat · 2022-07-12T10:37:51Z

#4953 is now merged, but it's only a partial solution. It's good enough for me, for the moment, but the Downward API volume is still needed if one requires waiting on sidecars.

I'll leave it up to the Tekton team to decide whether to close this issue or leave it open.

tekton-robot · 2022-10-10T11:03:32Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2022-11-09T11:20:56Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2022-12-09T11:48:23Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2022-12-09T11:48:26Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hWorblehat mentioned this issue Jun 3, 2022

Report downwardAPI volumes are not allowed to be used in a shared cluster #2307

Closed

hWorblehat mentioned this issue Jun 9, 2022

Only create & mount Downward API volume when necessary #4953

Merged

7 tasks

xchapter7x added this to Tekton Community Roadmap Sep 20, 2022

xchapter7x moved this to Todo in Tekton Community Roadmap Sep 20, 2022

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2022

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2022

tekton-robot closed this as completed Dec 9, 2022

Repository owner moved this from Todo to Done in Tekton Community Roadmap Dec 9, 2022

lbernick mentioned this issue Aug 22, 2023

feat/Cancel taskrun using entrypoint binary #6511

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove dependency on downwardAPI volumes #4937

Remove dependency on downwardAPI volumes #4937

hWorblehat commented Jun 3, 2022

imjasonh commented Jun 3, 2022

hWorblehat commented Jun 3, 2022

imjasonh commented Jun 3, 2022

vdemeester commented Jun 8, 2022

hWorblehat commented Jun 13, 2022

hWorblehat commented Jul 12, 2022

tekton-robot commented Oct 10, 2022

tekton-robot commented Nov 9, 2022

tekton-robot commented Dec 9, 2022

tekton-robot commented Dec 9, 2022

Remove dependency on downwardAPI volumes #4937

Remove dependency on downwardAPI volumes #4937

Comments

hWorblehat commented Jun 3, 2022

imjasonh commented Jun 3, 2022

hWorblehat commented Jun 3, 2022

imjasonh commented Jun 3, 2022

vdemeester commented Jun 8, 2022

hWorblehat commented Jun 13, 2022

hWorblehat commented Jul 12, 2022

tekton-robot commented Oct 10, 2022

tekton-robot commented Nov 9, 2022

tekton-robot commented Dec 9, 2022

tekton-robot commented Dec 9, 2022