Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated backport of #2824: Wait for node readiness before starting route-agent #2825

Conversation

skitt
Copy link
Member

@skitt skitt commented Sep 22, 2023

Backport of #2824 on release-0.16.

#2824: Wait for node readiness before starting route-agent

For details on the backport process, see the backport requests page.

Depends on submariner-io/submariner#2722

Route-agent startup is subject to races with OVN components: it mounts
the sockets from the host, and since the default host path mount
behaviour is to create a directory if the path is missing, if
route-agent pod initialisation happens before OVN has opened the
sockets, they get created as directories. This blocks socket creation
and OVN fails to start.

This wouldn't be a problem for most pods because they have tolerations
ensuring the node is ready (including CNI readiness) before they
start. The route-agent however has tolerates all taints, to ensure it
runs everywhere; this means it starts as soon as a node is ready.
There is no way to set up tolerations "except" a specific taint, so
the route-agent can't be specified in such a way that it will start
with any taint except node readiness or network availability.

It also isn't possible to handle this by specifying the socket host
path type; this causes the scheduler to wait for the socket to be
available before starting the pod. The route-agent needs to be able to
mount a number of different socket paths, to handle different setups,
and there is never a configuration where all socket paths are
available; so enforcing a socket type prevents the route-agent from
starting at all.

To handle this, an init container is set up for the route-agent, and
waits until the node is ready before allowing the route-agent setup to
continue. This init container does not specify the host path volumes
used by the main container, so the corresponding paths aren't touched
on the host. As a result, the route-agent is only set up once the node
is fully ready, including OVN sockets, so the appropriate sockets are
mounted correctly. (Directories are still created for missing socket
mounts, but that doesn't matter, because once this stage is reached
the missing socket mounts correspond to paths which aren't used by OVS
or OVN.)

Signed-off-by: Stephen Kitt <[email protected]>
@submariner-bot
Copy link
Contributor

🤖 Created branch: z_pr2825/skitt/automated-backport-of-#2824-origin-release-0.16
🚀 Full E2E won't run until the "ready-to-test" label is applied. I will add it automatically once the PR has 2 approvals, or you can add it manually.

@tpantelis tpantelis enabled auto-merge (rebase) September 22, 2023 14:30
@submariner-bot submariner-bot added the ready-to-test When a PR is ready for full E2E testing label Sep 22, 2023
@github-actions
Copy link

This PR/issue depends on:

@tpantelis tpantelis merged commit 8d03416 into submariner-io:release-0.16 Sep 22, 2023
@submariner-bot
Copy link
Contributor

🤖 Closed branches: [z_pr2825/skitt/automated-backport-of-#2824-origin-release-0.16]

@dfarrell07 dfarrell07 added the release-note-needed Should be mentioned in the release notes label Sep 26, 2023
@skitt skitt deleted the automated-backport-of-#2824-origin-release-0.16 branch October 17, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated-backport ready-to-test When a PR is ready for full E2E testing release-note-handled release-note-needed Should be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants