Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.process.Child.run fails to kill child on error condition #22433

Open
adrsimon opened this issue Jan 6, 2025 · 7 comments
Open

std.process.Child.run fails to kill child on error condition #22433

adrsimon opened this issue Jan 6, 2025 · 7 comments
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@adrsimon
Copy link

adrsimon commented Jan 6, 2025

Zig Version

0.14.0-dev.2596+e6879e99e

Steps to Reproduce and Observed Behavior

Hey, I've been tinkering around with Zig and found a strange bug while solving Ziglings. Basically my CPU went to 89°C for no clear reason. Here are the steps to reproduce.

  • Clone the ziglings repo
  • Open problem 049_quiz6.zig on your favorite editor
  • Implement this solution (that doesn't work), modifications are where the //HERE comments are.
// ...

const Elephant = struct {
    letter: u8,
    tail: ?*Elephant = null,
    trunk: ?*Elephant = null,
    visited: bool = false,

    // Elephant tail methods!
    pub fn getTail(self: *Elephant) *Elephant {
        return self.tail.?; // Remember, this means "orelse unreachable"
    }

    pub fn hasTail(self: *Elephant) bool {
        return (self.tail != null);
    }

    // Your Elephant trunk methods go here!
    // --------------------------------------------------- 
    //HERE

    pub fn getTrunk(self: *Elephant) *Elephant {
        return self.trunk.?;
    }

    pub fn hasTrunk(self: *Elephant) bool {
        return (self.trunk != null);
    }

    // ---------------------------------------------------

    pub fn visit(self: *Elephant) void {
        self.visited = true;
    }

    pub fn print(self: *Elephant) void {
        // Prints elephant letter and [v]isited
        const v: u8 = if (self.visited) 'v' else ' ';
        std.debug.print("{u}{u} ", .{ self.letter, v });
    }
};

pub fn main() void {
    var elephantA = Elephant{ .letter = 'A' };
    var elephantB = Elephant{ .letter = 'B' };
    var elephantC = Elephant{ .letter = 'C' };

    // We link the elephants so that each tail "points" to the next.
    elephantA.tail = &elephantB;
    elephantB.tail = &elephantC;
    elephantC.tail = &elephantA; // HERE

    // And link the elephants so that each trunk "points" to the previous.
    elephantB.trunk = &elephantA;
    elephantC.trunk = &elephantB;
    elephantA.trunk = &elephantC; // HERE

    visitElephants(&elephantA);

    std.debug.print("\n", .{});
}

// ...
  • Run the specific program : zig build -Dn=49
  • Rush to killing the phantom tasks that have been created.

Expected Behavior

Here are three screenshots of the situation.

  1. The error message.
  2. The phantom processes after running the program 4 times.
  3. A btop screenshot after launching the program 4 times in a row.

I basically ran it 4 times in a row after seeing it not working, and it ate half of my CPU cores, while making my CPU heat skyrocket.

Capture d’écran 2025-01-06 à 21 46 43 Capture d’écran 2025-01-06 à 21 49 17 Capture d’écran 2025-01-06 à 21 49 38

I'm not quite sure from where it comes from, either from a Zig compilation problem, or from the ziglings exercise itself. However, I don't have this problem with other Ziglings problems.

I would appreciate any insights on the situation ! Thanks :)

NB:

  • I know that this is not the intended solution for the exercise, however, I expect my program not to let open random processes after it terminates.
@adrsimon adrsimon added the bug Observed behavior contradicts documented or intended behavior label Jan 6, 2025
@mlugg
Copy link
Member

mlugg commented Jan 6, 2025

These aren't "phantom tasks", they're your compiled program running. By linking these 3 values in a loop, you have set up the code so that visitElephants will busy-loop forever, hence pinning a CPU core at ~100%.

@mlugg mlugg closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2025
@mlugg mlugg added question No questions on the issue tracker, please. and removed bug Observed behavior contradicts documented or intended behavior labels Jan 6, 2025
@adrsimon
Copy link
Author

adrsimon commented Jan 6, 2025

@mlugg thanks for your help ! I knew I was doing a loop, and that was kind of intended.

I'm not quite sure I get why doesn't the program stops when it encounters the error to be honest. I might not be enough used to Zig's compilation flow to understand what's going on.

The error kinda says that it was unable to spawn the program and to make it run, but then I still have a program going on ?

@mlugg
Copy link
Member

mlugg commented Jan 6, 2025

Apologies, I didn't read the issue closely enough. It's a bug that the zig build process is terminating.

@mlugg mlugg reopened this Jan 6, 2025
@mlugg mlugg added bug Observed behavior contradicts documented or intended behavior zig build system std.Build, the build runner, `zig build` subcommand, package management and removed question No questions on the issue tracker, please. labels Jan 6, 2025
@mlugg
Copy link
Member

mlugg commented Jan 6, 2025

Ahh, okay, std.Build.Step.Run has a mechanism to give up and kill the child process if it emits too much stdio. That explains the error.

But... somehow the child process isn't being killed? Even though there's an errdefer for it:

zig/lib/std/Build/Step/Run.zig

Lines 1352 to 1362 in 7aa95bc

try child.spawn();
errdefer {
_ = child.kill() catch {};
}
var timer = try std.time.Timer.start();
const result = if (run.stdio == .zig_test)
try evalZigTest(run, &child, prog_node, fuzz_context)
else
try evalGeneric(run, &child);

I'm guessing we're hitting an error condition on std.process.Child.kill?

@mlugg mlugg changed the title Phantom Tasks taking 100% CPU on a basic Ziglings program. std.Build.Step.Run fails to kill child on error condition Jan 6, 2025
@mlugg mlugg added this to the 0.15.0 milestone Jan 6, 2025
@ianprime0509
Copy link
Contributor

Ziglings doesn't use the standard run step, it has its own custom step which uses std.process.Child.run: https://codeberg.org/ziglings/exercises/src/commit/468e3d7eed7cfbb68067afb34593dbc1109396d7/build.zig#L342-L352

However, given there's no way for the user of std.process.Child.run to do anything about the child process when an error is returned (since the child process is created and spawned all within run), maybe the same errdefer should be added after spawning the child process?

try child.spawn();
(with some extra care taken to ensure the errdefer is only scoped up through try child.wait() and not, say, if try stdout.toOwnedSlice() fails after the process has already terminated)

@alexrp
Copy link
Member

alexrp commented Jan 7, 2025

cc @matklad (#21144)

@mlugg
Copy link
Member

mlugg commented Jan 7, 2025

Oh, wow, I didn't realise Ziglings had a custom step for this. That's... interesting.

But yeah, this should be handled by std.process.Child.run.

@mlugg mlugg added standard library This issue involves writing Zig code for the standard library. and removed zig build system std.Build, the build runner, `zig build` subcommand, package management labels Jan 7, 2025
@mlugg mlugg changed the title std.Build.Step.Run fails to kill child on error condition std.process.Child.run fails to kill child on error condition Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

4 participants