Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a regression when killing and deleting a container with shared(host) pid namespace #4048

Closed
wants to merge 5 commits into from

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Oct 2, 2023

If we create a container with a shared or host PID namespace, after the init process has dead, the container's state becomes Stopped, but after we have merged #3825, we can't send any signal to a stopped container, because we removed the logic to check this condition in f8ad20f . We should find them back.

Fix #4047
This also fixes #4040 .

@lifubang lifubang force-pushed the fix-NoPIDNsKill branch 2 times, most recently from f5e1a7d to e17f88c Compare October 2, 2023 09:35
@lifubang lifubang added this to the 1.2.0 milestone Oct 2, 2023
@lifubang lifubang force-pushed the fix-NoPIDNsKill branch 3 times, most recently from ee898ab to 6b04d85 Compare October 2, 2023 10:04
@lifubang lifubang requested review from kolyshkin and cyphar October 2, 2023 10:34
@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error {
// To avoid a PID reuse attack, don't kill non-running container.
switch status {
case Running, Created, Paused:
case Stopped:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the pid namespace is private and all the processes should receive the kill signal for stopped, I think we can kill if cgroup exists.

Copy link
Member Author

@lifubang lifubang Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pid namespace is private, all processes in this pid ns will be killed by kernel automatically after the init process dead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For the logic, we can simplify it with checking the cgroup only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so, with private pid ns and without systemd driver, the cgroup path is always exit even if there is no process, so this will cause an unnecessary function call.

@lifubang lifubang changed the title Fix a regression when killing and deleting a container with shared(hosted) pid namespace Fix a regression when killing and deleting a container with shared(host) pid namespace Oct 2, 2023
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
return err
}
}
err := c.cgroupManager.Destroy()
Copy link
Member Author

@lifubang lifubang Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, I think if we can’t destroy the cgroup, we should not go to the next step. It will cause the state directory has been deleted but processes in the old container are still running, we can’t kill them from runc anymore. WDYT @kolyshkin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error has been ignored silently.

Comment on lines 38 to 42
if !c.config.Namespaces.IsPrivate(configs.NEWPID) && c.cgroupManager.Exists() {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong -- it should only be done if runc delete --force is used. So, I guess, this logic should be in delete.go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, we can go here because the container has been in ‘Stopped’ state, delete without ’-f’ should work will be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure if runc delete should really kill any processes. That is such a mess 😩

OTOH runtime-spec says

Attempting to delete a container that is not stopped MUST have no effect on the container and MUST generate an error. Deleting a container MUST delete the resources that were created during the create step. Note that resources associated with the container, but not created by this container, MUST NOT be deleted.

and we treat a running container without init process as stopped (which btw may be wrong, too). If that is so, should we treat those leftover processes as "resources that were created during the create step"? Hmm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we treat a running container without init process as stopped (which btw may be wrong, too).

What if we treat such a container as running? See
#4040 (comment)

@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error {
// To avoid a PID reuse attack, don't kill non-running container.
switch status {
case Running, Created, Paused:
case Stopped:
if c.config.Namespaces.IsPrivate(configs.NEWPID) || !c.cgroupManager.Exists() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is we now have two different places in which we check IsPrivate. Also note that these two checks are slightly different (one also checks for cgroup existence, the other checks if s is SIGKILL. This is probably wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another (minor) problem is with commit 9583b3d is it now calls Thaw even when we use signalAllProcesses, which is unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other checks if s is SIGKILL. This is probably wrong.

This is because you we remove ’-a’ option, so we also need to check SIGKILL here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant it, there should be only one single check, deciding if we should use the usual logic (kill init) or the special case logic (kill all processes).

Something like

        if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
                if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
                        return fmt.Errorf("unable to kill all processes: %w", err)
                }
                return nil
        }

// ... normal kill code goes here

plus

--- a/libcontainer/init_linux.go
+++ b/libcontainer/init_linux.go
@@ -633,6 +633,9 @@ func setupRlimits(limits []configs.Rlimit, pid int) error {
 // signalAllProcesses freezes then iterates over all the processes inside the
 // manager's cgroups sending the signal s to them.
 func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
+       if !m.Exists() {
+               return ErrNotRunning
+       }
        // Use cgroup.kill, if available.
        if s == unix.SIGKILL {
                if p := m.Path(""); p != "" { // Either cgroup v2 or hybrid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin I think maybe we should find back runc kill -a?
It is reasonable to remove -a with the signal KILL, but runc kill can pass other signals besides KILL, if we remove -a for runc-kill, it will have no way to send a signal to all the processes in the container.

@lifubang lifubang force-pushed the fix-NoPIDNsKill branch 2 times, most recently from 575bba9 to 48ff07c Compare October 10, 2023 04:53
@lifubang
Copy link
Member Author

Although I want to find back runc kill -a, but maybe it's not easy to push it forward in opencontainers/runtime-spec#1234 .
But the bug in the main branch should be fixed, and some of these commits should be backported to release 1.1.

@@ -873,6 +873,26 @@ func (c *Container) newInitConfig(process *Process) *initConfig {
func (c *Container) Destroy() error {
c.m.Lock()
defer c.m.Unlock()
if !c.config.Namespaces.IsPrivate(configs.NEWPID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very much against doing this. Kill should do all the killing, and Destroy should remove the files. This was all mixed up before, but I got it untangled in #3825 (alas, with a couple of regressions which you have reported in #4047).

The alternative to doing this, without mixing the kill and destroy again, is this: 7de61c4 (PTAL 🙏🏻)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we can't force users to use runc kill before runc delete if the container is in stopped state.

@lifubang
Copy link
Member Author

As #4102 has merged, close this one.

@lifubang lifubang closed this Nov 28, 2023
@lifubang lifubang deleted the fix-NoPIDNsKill branch October 15, 2024 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants