Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver Bug (Intel, NVIDIA): segfault when building too many programs at once #110

Open
rubdos opened this issue Jun 3, 2018 · 5 comments
Labels

Comments

@rubdos
Copy link

rubdos commented Jun 3, 2018

I had a construction like

loop {
    ProQue::builder().build()
}

due to me benchmarking a simple prototype function. This crashed when looped too often.
When ran outside valgrind, the program often worked, but within valgrind, I have consitent crashes.

For future reference, you can find this "bad" code on https://gitlab.com/rubdos/multicore-ocl-project, branch segfault.

When digging deeper, some of the ocl unit tests also seem to be affected:

  • buffer_fill::fill
  • buffer_copy::buffer_copy_core
  • buffer_copy::buffer_copy_standard
  • clear_completed::clear_completed
  • image_ops::image_ops
  • buffer_ops_rect::buffer_ops_rect

these tests all crash when ran within valgrind, ending in a segmentation fault.

@c0gent
Copy link
Member

c0gent commented Jun 3, 2018

Our crack team of valgrinders is on the case...

@c0gent
Copy link
Member

c0gent commented Jun 3, 2018

If you get a minute could you:

git clone https://github.com/c0gent/ocl-segfault.git 
cd ocl-segfault
cargo run --release

I'm unable to reproduce and I want to see if that segfaults on your system.

@c0gent
Copy link
Member

c0gent commented Jun 5, 2018

Update: After finally getting the Rust-ocl version to segfault consistently on Intel drivers (but not AMD), I created a an equivalent C++ version, opencl-debug-build-cpp, which also segfaults under the same conditions.

This is definitely an Intel driver issue (possibly NVIDIA too) and not specific to this library. I'll keep this issue open for now until I have time to file bug reports where appropriate.

Since this is problem is restricted to unrealistic use cases such as valgrind and perhaps benchmarks, I don't consider it worth the expense of adding synchronization mechanisms as a work around at present.

It's also unclear whether the size or complexity of the program source code or whether or not the programs are identical are factors.

@c0gent c0gent changed the title SIGSEGV when looping construction of ProQue SIGSEGV when building too many of the same program on the same device at the same time Jun 5, 2018
@c0gent c0gent added the wontfix label Jun 5, 2018
@rubdos
Copy link
Author

rubdos commented Jun 5, 2018

Thanks for the heads up, work, and keeping us posted. Appreciate it.

@c0gent c0gent changed the title SIGSEGV when building too many of the same program on the same device at the same time Driver Bug: SIGSEGV on Intel (and NVIDIA?) when building too many programs at once Jun 25, 2018
@c0gent c0gent changed the title Driver Bug: SIGSEGV on Intel (and NVIDIA?) when building too many programs at once Driver Bug (Intel, NVIDIA): SIGSEGV when building too many programs at once Jun 25, 2018
@c0gent c0gent changed the title Driver Bug (Intel, NVIDIA): SIGSEGV when building too many programs at once Driver Bug (Intel, NVIDIA): segfault when building too many programs at once Jun 26, 2018
@kpp
Copy link
Contributor

kpp commented Jun 8, 2019

For Intel CPUs this bug was fixed in https://github.com/intel/compute-runtime (replacement for Beignet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants