Skip to content

Commit

Permalink
Dont ignore failure to create cgroup after timeout
Browse files Browse the repository at this point in the history
Before this commit, creating a cgroup would silently ignore timeouts and
carry on. Concretely, this caused cases where a cgroup failed to create,
but the caller doesn't realize and ends up looking for files that should
exist (e.g. cgroups.controllers), only to find they don't exist. It's
very difficult as a caller to deal with this case, where NewSystemd
succeeds but the group doesn't exist.

The origins of this code seem to trace back to an initial implementation
written 5+ years ago:
5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672

runc added roughly the same logic here to deal with the same issue:
opencontainers/runc#3782

Now, containerd will also error if a cgroup cannot be created within the
timeout window.

Signed-off-by: Josh Chorlton <[email protected]>
  • Loading branch information
jchorl committed Sep 25, 2024
1 parent 190de3b commit 2e25118
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions cgroup2/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -952,14 +952,16 @@ func startUnit(conn *systemdDbus.Conn, group string, properties []systemdDbus.Pr
}
}

systemdStartUnitTimeout := 30 * time.Second
select {
case s := <-statusChan:
if s != "done" {
attemptFailedUnitReset(conn, group)
return fmt.Errorf("error creating systemd unit `%s`: got `%s`", group, s)
}
case <-time.After(30 * time.Second):
log.G(ctx).Warnf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus. Continuing...", group)
case <-time.After(systemdStartUnitTimeout):
attemptFailedUnitReset(conn, group)
return fmt.Errorf("timed out while waiting for StartTransientUnit(%s) completion signal from dbus after %v", group, systemdStartUnitTimeout)
}

return nil
Expand Down

0 comments on commit 2e25118

Please sign in to comment.