-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a regression when killing and deleting a container with shared(host) pid namespace #4048
Conversation
f5e1a7d
to
e17f88c
Compare
ee898ab
to
6b04d85
Compare
libcontainer/container_linux.go
Outdated
@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error { | |||
// To avoid a PID reuse attack, don't kill non-running container. | |||
switch status { | |||
case Running, Created, Paused: | |||
case Stopped: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the pid namespace is private and all the processes should receive the kill signal for stopped, I think we can kill if cgroup exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pid namespace is private, all processes in this pid ns will be killed by kernel automatically after the init process dead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For the logic, we can simplify it with checking the cgroup only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think so, with private pid ns and without systemd driver, the cgroup path is always exit even if there is no process, so this will cause an unnecessary function call.
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | ||
return err | ||
} | ||
} | ||
err := c.cgroupManager.Destroy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, I think if we can’t destroy the cgroup, we should not go to the next step. It will cause the state directory has been deleted but processes in the old container are still running, we can’t kill them from runc anymore. WDYT @kolyshkin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error has been ignored silently.
libcontainer/state_linux.go
Outdated
if !c.config.Namespaces.IsPrivate(configs.NEWPID) && c.cgroupManager.Exists() { | ||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong -- it should only be done if runc delete --force
is used. So, I guess, this logic should be in delete.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, we can go here because the container has been in ‘Stopped’ state, delete without ’-f’ should work will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure if runc delete
should really kill any processes. That is such a mess 😩
OTOH runtime-spec says
Attempting to delete a container that is not stopped MUST have no effect on the container and MUST generate an error. Deleting a container MUST delete the resources that were created during the create step. Note that resources associated with the container, but not created by this container, MUST NOT be deleted.
and we treat a running container without init process as stopped (which btw may be wrong, too). If that is so, should we treat those leftover processes as "resources that were created during the create step"? Hmm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we treat a running container without init process as stopped (which btw may be wrong, too).
What if we treat such a container as running? See
#4040 (comment)
libcontainer/container_linux.go
Outdated
@@ -371,6 +371,10 @@ func (c *Container) Signal(s os.Signal) error { | |||
// To avoid a PID reuse attack, don't kill non-running container. | |||
switch status { | |||
case Running, Created, Paused: | |||
case Stopped: | |||
if c.config.Namespaces.IsPrivate(configs.NEWPID) || !c.cgroupManager.Exists() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is we now have two different places in which we check IsPrivate
. Also note that these two checks are slightly different (one also checks for cgroup existence, the other checks if s
is SIGKILL. This is probably wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another (minor) problem is with commit 9583b3d is it now calls Thaw even when we use signalAllProcesses, which is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other checks if
s
is SIGKILL. This is probably wrong.
This is because you we remove ’-a’ option, so we also need to check SIGKILL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant it, there should be only one single check, deciding if we should use the usual logic (kill init) or the special case logic (kill all processes).
Something like
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
return fmt.Errorf("unable to kill all processes: %w", err)
}
return nil
}
// ... normal kill code goes here
plus
--- a/libcontainer/init_linux.go
+++ b/libcontainer/init_linux.go
@@ -633,6 +633,9 @@ func setupRlimits(limits []configs.Rlimit, pid int) error {
// signalAllProcesses freezes then iterates over all the processes inside the
// manager's cgroups sending the signal s to them.
func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
+ if !m.Exists() {
+ return ErrNotRunning
+ }
// Use cgroup.kill, if available.
if s == unix.SIGKILL {
if p := m.Path(""); p != "" { // Either cgroup v2 or hybrid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolyshkin I think maybe we should find back runc kill -a
?
It is reasonable to remove -a
with the signal KILL
, but runc kill
can pass other signals besides KILL
, if we remove -a
for runc-kill, it will have no way to send a signal to all the processes in the container.
9b7b013
to
7c20ac3
Compare
Signed-off-by: lifubang <[email protected]>
Signed-off-by: lifubang <[email protected]>
Signed-off-by: lifubang <[email protected]>
Signed-off-by: lifubang <[email protected]>
575bba9
to
48ff07c
Compare
Although I want to find back |
@@ -873,6 +873,26 @@ func (c *Container) newInitConfig(process *Process) *initConfig { | |||
func (c *Container) Destroy() error { | |||
c.m.Lock() | |||
defer c.m.Unlock() | |||
if !c.config.Namespaces.IsPrivate(configs.NEWPID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very much against doing this. Kill
should do all the killing, and Destroy
should remove the files. This was all mixed up before, but I got it untangled in #3825 (alas, with a couple of regressions which you have reported in #4047).
The alternative to doing this, without mixing the kill and destroy again, is this: 7de61c4 (PTAL 🙏🏻)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think we can't force users to use runc kill
before runc delete
if the container is in stopped
state.
Signed-off-by: lifubang <[email protected]>
cc9addd
to
9225eaa
Compare
As #4102 has merged, close this one. |
If we create a container with a shared or host
PID
namespace, after the init process has dead, the container's state becomesStopped
, but after we have merged #3825, we can't send any signal to a stopped container, because we removed the logic to check this condition in f8ad20f . We should find them back.Fix #4047
This also fixes #4040 .