Skip to content

Commit

Permalink
Wait for node readiness before starting route-agent
Browse files Browse the repository at this point in the history
Route-agent startup is subject to races with OVN components: it mounts
the sockets from the host, and since the default host path mount
behaviour is to create a directory if the path is missing, if
route-agent pod initialisation happens before OVN has opened the
sockets, they get created as directories. This blocks socket creation
and OVN fails to start.

This wouldn't be a problem for most pods because they have tolerations
ensuring the node is ready (including CNI readiness) before they
start. The route-agent however has tolerates all taints, to ensure it
runs everywhere; this means it starts as soon as a node is ready.
There is no way to set up tolerations "except" a specific taint, so
the route-agent can't be specified in such a way that it will start
with any taint except node readiness or network availability.

It also isn't possible to handle this by specifying the socket host
path type; this causes the scheduler to wait for the socket to be
available before starting the pod. The route-agent needs to be able to
mount a number of different socket paths, to handle different setups,
and there is never a configuration where all socket paths are
available; so enforcing a socket type prevents the route-agent from
starting at all.

To handle this, an init container is set up for the route-agent, and
waits until the node is ready before allowing the route-agent setup to
continue. This init container does not specify the host path volumes
used by the main container, so the corresponding paths aren't touched
on the host. As a result, the route-agent is only set up once the node
is fully ready, including OVN sockets, so the appropriate sockets are
mounted correctly. (Directories are still created for missing socket
mounts, but that doesn't matter, because once this stage is reached
the missing socket mounts correspond to paths which aren't used by OVS
or OVN.)

Signed-off-by: Stephen Kitt <[email protected]>
  • Loading branch information
skitt authored and tpantelis committed Sep 22, 2023
1 parent b00f53e commit 8d03416
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions controllers/submariner/route_agent_resources.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,24 @@ func newRouteAgentDaemonSet(cr *v1alpha1.Submariner, name string) *appsv1.Daemon
Path: "/var/run/ovn-ic/ovnnb_db.sock",
}}},
},
// The route agent needs to wait for the node to be ready before starting,
// to avoid racing with the CNI for socket setup; this init container takes care of that
InitContainers: []corev1.Container{
{
Name: name + "-init",
Image: getImagePath(cr, opnames.RouteAgentImage, names.RouteAgentComponent),
ImagePullPolicy: images.GetPullPolicy(cr.Spec.Version, cr.Spec.ImageOverrides[names.RouteAgentComponent]),
Command: []string{"submariner-route-agent.sh"},
Env: []corev1.EnvVar{
{Name: "SUBMARINER_WAITFORNODE", Value: "true"},
{Name: "NODE_NAME", ValueFrom: &corev1.EnvVarSource{
FieldRef: &corev1.ObjectFieldSelector{
FieldPath: "spec.nodeName",
},
}},
},
},
},
Containers: []corev1.Container{
{
Name: name,
Expand Down

0 comments on commit 8d03416

Please sign in to comment.