Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPoll implementation fails on TruffleRuby #2

Closed
ioquatix opened this issue Apr 28, 2021 · 7 comments · Fixed by #5
Closed

EPoll implementation fails on TruffleRuby #2

ioquatix opened this issue Apr 28, 2021 · 7 comments · Fixed by #5

Comments

@ioquatix
Copy link
Member

@eregon do you have any time to investigate this failure?

https://github.com/socketry/event/runs/2454091746

@eregon
Copy link
Contributor

eregon commented Apr 28, 2021

I'm taking a look, it sounds like the main Thread main Fiber is trying to acquire the C extension lock, while another Fiber is paused but still has the C extension lock.

@eregon
Copy link
Contributor

eregon commented Apr 28, 2021

After some work to print all Fiber backtraces, here is what I see:

All Fiber backtraces:
#<Fiber:0x9678 root (resumed)> of #<Thread:0x9688 Ruby-reference-processor sleep>

#<Fiber:0x96a8 root (resumed)> of #<Thread:0x96b8 main run>
<internal:core> core/main.rb:64:in `block in <top (required)>'
<internal:core> core/main.rb:85:in `block (2 levels) in <top (required)>'
/home/eregon/code/event/spec/event/selector_examples.rb:60:in `select'
/home/eregon/code/event/spec/event/selector_examples.rb:60:in `block (3 levels) in <top (required)>'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:262:in `instance_exec'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:262:in `block in run'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:508:in `block in with_around_and_singleton_context_hooks'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:465:in `block in with_around_example_hooks'
gems/rspec-core-3.10.1/lib/rspec/core/hooks.rb:486:in `block in run'
gems/rspec-core-3.10.1/lib/rspec/core/hooks.rb:624:in `run_around_example_hooks_for'
gems/rspec-core-3.10.1/lib/rspec/core/hooks.rb:486:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:465:in `with_around_example_hooks'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:508:in `with_around_and_singleton_context_hooks'
gems/rspec-core-3.10.1/lib/rspec/core/example.rb:259:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:644:in `block in run_examples'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:640:in `map'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:640:in `run_examples'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:606:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `block in run'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `map'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `block in run'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `map'
gems/rspec-core-3.10.1/lib/rspec/core/example_group.rb:607:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:121:in `block (3 levels) in run_specs'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:121:in `map'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:121:in `block (2 levels) in run_specs'
gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2067:in `with_suite_hooks'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:116:in `block in run_specs'
gems/rspec-core-3.10.1/lib/rspec/core/reporter.rb:74:in `report'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:115:in `run_specs'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:89:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:71:in `run'
gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:45:in `invoke'
gems/rspec-core-3.10.1/exe/rspec:4:in `<top (required)>'
<internal:core> core/kernel.rb:400:in `load'
<internal:core> core/kernel.rb:400:in `load'
bin/rspec:23:in `<top (required)>'
<internal:core> core/kernel.rb:400:in `load'
<internal:core> core/kernel.rb:400:in `load'
truffleruby/lib/mri/bundler/cli/exec.rb:63:in `kernel_load'
truffleruby/lib/mri/bundler/cli/exec.rb:28:in `run'
truffleruby/lib/mri/bundler/cli.rb:476:in `exec'
truffleruby/lib/mri/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
truffleruby/lib/mri/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
truffleruby/lib/mri/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
truffleruby/lib/mri/bundler/cli.rb:30:in `dispatch'
truffleruby/lib/mri/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
truffleruby/lib/mri/bundler/cli.rb:24:in `start'
truffleruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:46:in `block in <top (required)>'
truffleruby/lib/mri/bundler/friendly_errors.rb:123:in `with_friendly_errors'
truffleruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:34:in `<top (required)>'
<internal:core> core/kernel.rb:400:in `load'
<internal:core> core/kernel.rb:400:in `load'
truffleruby/bin/bundle:42:in `<main>'

#<Fiber:0x96d8 /home/eregon/code/event/spec/event/selector_examples.rb:43 (suspended)> of #<Thread:0x96b8 main run>
truffleruby/lib/truffle/truffle/cext.rb:884:in `transfer'
truffleruby/lib/truffle/truffle/cext.rb:884:in `rb_funcall'
/home/eregon/code/event/ext/event/backend/epoll.c:149:in `io_wait_transfer'
truffleruby/lib/truffle/truffle/cext.rb:1521:in `rb_ensure'
exception.c:97:in `rb_ensure'
/home/eregon/code/event/ext/event/backend/epoll.c:193:in `Event_Backend_EPoll_io_wait'
truffleruby/lib/truffle/truffle/cext_ruby.rb:41:in `io_wait'
/home/eregon/code/event/spec/event/selector_examples.rb:47:in `block (4 levels) in <top (required)>'

So the last Fiber still holds the C extension lock, and is calling Fiber#transfer, and the first Fiber is trying to get the C extension lock but it can't.

@eregon
Copy link
Contributor

eregon commented Apr 28, 2021

What we need here is that Fiber#transfer is called without the C extension lock, i.e., under rb_thread_call_without_gvl().
On CRuby it's probably not needed (and probably shouldn't be used since rb_*() must not be used inside rb_thread_call_without_gvl()/without the GIL) due to Fibers being coroutines and the same native thread from the point of view of the GIL.

@ioquatix It's unfortunate there is no rb_fiber_transfer() in the C API yet (there is rb_fiber_resume and rb_fiber_yield though), but we could define it in TruffleRuby and you could have a default implementation of it in the gem.
I assume the transfer is intentional instead resume/yield?

@eregon
Copy link
Contributor

eregon commented Apr 28, 2021

It's also possible to disable the C extensions lock in TruffleRuby with TRUFFLERUBYOPT="--experimental-options --cexts-lock=false", and we have plans to expose that properly to the C API so it can be done per extension.
That assumes/requires the C code is safe to be called from multiple Ruby Threads in parallel though.

With that we get:

TRUFFLERUBYOPT="--experimental-options --cexts-lock=false" bundle exec rspec                                                     
warning: parser/current is loading parser/ruby27, which recognizes
warning: 2.7.3-compliant syntax, but you are running 2.7.2.
warning: please see https://github.com/whitequark/parser#compatibility-with-ruby-mri.
Script coverage disabled: unknown event: script_compiled
Line coverage disabled: unknown event: call

Event::Debug::Selector
  #io_wait
    cannot have two fibers reading from the same io

Event::Backend::EPoll
  behaves like Event::Selector
    .new
      can create multiple selectors
    #io_wait
      can wait for an io to become readable
      can wait for an io to become writable
      can read and write from two different fibers
      can handle exception during wait (FAILED - 1)

Event::Backend::Select
  behaves like Event::Selector
    .new
      can create multiple selectors
    #io_wait
      can wait for an io to become readable
      can wait for an io to become writable
      can read and write from two different fibers
      can handle exception during wait (FAILED - 2)

Event
  has a version number

Failures:

  1) Event::Backend::EPoll behaves like Event::Selector #io_wait can handle exception during wait
     Failure/Error: fiber.raise(RuntimeError.new("Boom"))
     
     NoMethodError:
       private method `raise' called for #<Fiber:0x11888>
     Shared Example Group: Event::Selector called from ./spec/event/selector_spec.rb:28
     # ./spec/event/selector_examples.rb:150:in `block (3 levels) in <top (required)>'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli/exec.rb:63:in `kernel_load'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli/exec.rb:28:in `run'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:476:in `exec'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:30:in `dispatch'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:24:in `start'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:46:in `block in <top (required)>'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/friendly_errors.rb:123:in `with_friendly_errors'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:34:in `<top (required)>'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'

  2) Event::Backend::Select behaves like Event::Selector #io_wait can handle exception during wait
     Failure/Error: fiber.raise(RuntimeError.new("Boom"))
     
     NoMethodError:
       private method `raise' called for #<Fiber:0x118a8>
     Shared Example Group: Event::Selector called from ./spec/event/selector_spec.rb:28
     # ./spec/event/selector_examples.rb:150:in `block (3 levels) in <top (required)>'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli/exec.rb:63:in `kernel_load'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli/exec.rb:28:in `run'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:476:in `exec'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor.rb:399:in `dispatch'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:30:in `dispatch'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/vendor/thor/lib/thor/base.rb:476:in `start'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/cli.rb:24:in `start'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:46:in `block in <top (required)>'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/mri/bundler/friendly_errors.rb:123:in `with_friendly_errors'
     # /home/eregon/code/truffleruby-ws/graal/sdk/mxbuild/linux-amd64/GRAALVM_077B8863DC_JAVA11/graalvm-077b8863dc-java11-21.2.0-dev/languages/ruby/lib/gems/gems/bundler-2.1.4/libexec/bundle:34:in `<top (required)>'
     # <internal:core> core/kernel.rb:400:in `load'
     # <internal:core> core/kernel.rb:400:in `load'

Finished in 0.1942 seconds (files took 1.51 seconds to load)
12 examples, 2 failures

Failed examples:

rspec './spec/event/selector_spec.rb[1:1:2:4]' # Event::Backend::EPoll behaves like Event::Selector #io_wait can handle exception during wait
rspec './spec/event/selector_spec.rb[2:1:2:4]' # Event::Backend::Select behaves like Event::Selector #io_wait can handle exception during wait

Which means Fiber#raise is not yet implemented in TruffleRuby (we should add it: oracle/truffleruby#2338).

@ioquatix
Copy link
Member Author

Thanks for this detailed analysis. It's really excellent.

Well, I plan to improve the GVL handling of this code, i.e. release it.

I would say, we should add rb_fiber_transfer. But it's not always a given that the instance provided would be a fiber, although we would generally expect it to be.

eregon added a commit to eregon/io-event that referenced this issue May 3, 2021
eregon added a commit to eregon/io-event that referenced this issue May 3, 2021
eregon added a commit to eregon/io-event that referenced this issue May 10, 2021
@eregon
Copy link
Contributor

eregon commented May 10, 2021

Now the CI passes on #5: https://github.com/eregon/event/runs/2548723638?check_suite_focus=true
I'll try to remove the need for TRUFFLERUBYOPT though, probably TruffleRuby can release the C ext lock on any rb_funcall().

@ioquatix
Copy link
Member Author

The work you are doing here is amazing, thank you so much. I am looking forward to preparing for Ruby 3.1 when we can officially start rolling this out along with Async 2.x. We should aim to have this fully supported by TruffleRuby if possible.

graalvmbot pushed a commit to oracle/truffleruby that referenced this issue May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants