Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system hangs if file does not exist #351

Open
cretz opened this issue Nov 4, 2024 · 11 comments
Open

system hangs if file does not exist #351

cretz opened this issue Nov 4, 2024 · 11 comments

Comments

@cretz
Copy link

cretz commented Nov 4, 2024

If you run system inside Async with a non-existent executable on Linux with Ruby 3.3 it hangs. This behavior is confirmed in 2.17.0 and 2.18.0 (I didn't check any earlier than that).

This synchronous code works:

puts "Running on: #{RbConfig::CONFIG['host_os']}"
res = system 'echo', 'hello'
puts "Result 1: #{res}"
res = system 'does-not-exist', 'hello'
puts "Result 2: #{res}"

And outputs:

Running on: linux
hello
Result 1: true
Result 2:

But this code does not:

require 'async'

Async do
  puts "Running on: #{RbConfig::CONFIG['host_os']}"
  res = system 'echo', 'hello'
  puts "Result 1: #{res}"
  res = system 'does-not-exist', 'hello'
  puts "Result 2: #{res}"
end

This hangs on the second system call. This was discovered after a deep dive of why Minitest assert_equal was hanging. It turns out, Minitest uses diff on the system when reporting an assert_equal failure, but first checks for gdiff which we don't have, ref: https://github.com/minitest/minitest/blob/e4417c5b13ab917a60c218700041152136bc07c8/lib/minitest/assertions.rb#L33. So we have to Minitest::Assertions.diff = 'diff' to get Async-based tests to not break here.

(sorry if this is already reported, I could not find an issue)

cretz added a commit to cretz/temporal-sdk-ruby that referenced this issue Nov 4, 2024
@ioquatix
Copy link
Member

ioquatix commented Nov 4, 2024

Nice find. I didn't know about this! haha.

image

I'll investigate the issue today.

@ioquatix
Copy link
Member

ioquatix commented Nov 5, 2024

I could not reproduce from the given example.

samuel@aiko ~/D/s/async (main)> ruby --version
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
samuel@aiko ~/D/s/async (main)> cat ./test.rb
#!/usr/bin/env ruby

require "async"

Async do
	puts "Running on: #{RbConfig::CONFIG['host_os']}"
	res = system 'echo', 'hello'
	puts "Result 1: #{res.inspect}"
	res = system 'does-not-exist', 'hello'
	puts "Result 2: #{res.inspect}"
end
samuel@aiko ~/D/s/async (main)> bundle exec ./test.rb
Running on: linux
hello
Result 1: true
Result 2: nil

Can you confirm I've got the correct reproduction script?

@ioquatix
Copy link
Member

ioquatix commented Nov 5, 2024

I tried to reproduce the original error but couldn't:

#!/usr/bin/env ruby

require "minitest/autorun"
require "async"

class Test < Minitest::Test
	BIG_TEXT_A = "The quick brown fox jumps over the lazy dog.\n" * 10
	BIG_TEXT_B = BIG_TEXT_A.dup.sub("quick", "slow")
	
	def test_diff
		Async do |task|
			assert_equal(BIG_TEXT_A, BIG_TEXT_B)
		end
	end
end

I added some instrumentation to the scheduler process_wait hook:

		# Wait for the specified process ID to exit.
		# @parameter pid [Integer] The process ID to wait for.
		# @parameter flags [Integer] A bit-mask of flags suitable for `Process::Status.wait`.
		# @returns [Process::Status] A process status instance.
		# @asynchronous May be non-blocking..
		def process_wait(pid, flags)
			return @selector.process_wait(Fiber.current, pid, flags).tap{|result|
				Fiber.blocking{$stderr.puts "process_wait(#{pid}, #{flags}) -> #{result.inspect}"}
			}
		end

It appears to work okay:

> bundle exec ./test.rb
Run options: --seed 20496

# Running:

process_wait(927854, 0) -> #<Process::Status: pid 927854 exit 127>
process_wait(927855, 0) -> #<Process::Status: pid 927855 exit 0>
process_wait(927856, 0) -> #<Process::Status: pid 927856 exit 1>
F

Finished in 0.002793s, 358.0683 runs/s, 358.0683 assertions/s.

  1) Failure:
Test#test_diff [test.rb:12]:
--- expected
+++ actual
@@ -1,4 +1,4 @@
-"The quick brown fox jumps over the lazy dog.
+"The slow brown fox jumps over the lazy dog.
 The quick brown fox jumps over the lazy dog.
 The quick brown fox jumps over the lazy dog.
 The quick brown fox jumps over the lazy dog.


1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

@ioquatix
Copy link
Member

ioquatix commented Nov 5, 2024

Is it possible something in your test suite is monkey patching Kernel::system?

ioquatix added a commit that referenced this issue Nov 5, 2024
@cretz
Copy link
Author

cretz commented Nov 5, 2024

Interesting, this is breaking for me:

cretz@DESKTOP:~/work/temp$ cat ./test.rb
require 'async'

Async do
  puts "Running on: #{RbConfig::CONFIG['host_os']}"
  res = system 'echo', 'hello'
  puts "Result 1: #{res}"
  res = system 'does-not-exist', 'hello'
  puts "Result 2: #{res}"
end
cretz@DESKTOP:~/work/temp$ bundle exec ruby --version
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [x86_64-linux]
cretz@DESKTOP:~/work/temp$ bundle exec ruby ./test.rb
Running on: linux
hello
Result 1: true

It hangs right there. I am not doing any monkey patching. This was also hanging for me inside of a GitHub action for Ubuntu and Ruby 3.3 (but not 3.1 I don't think). I will do some more investigation, possibly including a simple repository that replicates the issue with a simple GitHub action, and get back. (may be a bit, I am traveling)

@olleolleolle
Copy link
Contributor

olleolleolle commented Nov 7, 2024

I was also able to make it hang on that.

# issue-351.rb
require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'async'
end

Async do
  puts "Running on: #{RbConfig::CONFIG['host_os']}"
  res = system 'echo', 'hello'
  puts "Result 1: #{res}"
  res = system 'does-not-exist', 'hello'
  puts "Result 2: #{res}"
end

My output was: note where I hit Ctrl-C and it got output, and then the first word Result 2:

$ ruby issue-351.rb
Running on: linux
hello
Result 1: true
^CResult 2:
/home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:317:in `select': Interrupt
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:317:in `run_once!'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:353:in `run_once'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:408:in `block in run'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:386:in `block in run_loop'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:383:in `handle_interrupt'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:383:in `run_loop'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:407:in `run'
        from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/kernel/async.rb:34:in `Async'
        from issue-351.rb:8:in `<main>'

See Bundler inline docs.

@ioquatix
Copy link
Member

ioquatix commented Nov 8, 2024

lol, I cannot reproduce, I tried IO_EVENT_SELECTOR=Select too.

Okay, can you please tell me:

  • uname -a
  • ruby -v
  • Your shell and version of shell.

@ioquatix
Copy link
Member

ioquatix commented Nov 8, 2024

For me it's:

samuel@aiko ~/D/tmp> uname -a
Linux aiko 6.11.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 01 Nov 2024 03:30:41 +0000 x86_64 GNU/Linux
samuel@aiko ~/D/tmp> ruby -v
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [x86_64-linux]
samuel@aiko ~/D/tmp> $SHELL -v
fish, version 3.7.1

@ioquatix
Copy link
Member

ioquatix commented Nov 8, 2024

In Async v2.19, I've introduced more debug output: https://socketry.github.io/async/releases/index#async::scheduler-debugging

Can one of you run the hanging example with CONSOLE_LEVEL=debug environment variable set, and show me the full output?

If it's not too verbose, it might also be useful to see the output generated by RUBYOPT=-d ....

Thanks.

@ioquatix
Copy link
Member

This appears to be a problem in Ruby process.c which can fork with an active scheduler without cleaning up correctly. I think at best we can fix this in Ruby 3.4 and maybe backport the fix. I'll need to investigate further.

@ioquatix
Copy link
Member

See ruby/ruby#12052 for a possible fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants