-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorFlow does not see all available GPUs in my system #252
Comments
Try computecpp 0.9.0 for starters? |
Sorry, didn't understood the question... I was using ComputeCpp-v0.6.0-4212-gb29ac8a, but ComputeCpp-v0.6.0-4212-gb29ac8a itself is working fine, it looks as TF is buggy... |
@lu4 as @mirh suggested, compiling with our latest ComputeCpp version will let you use a more recent version of TF. Could you try and download ComputeCpp CE 0.9.1? To compile you will need to use the latest commit of the eigen_sycl branch here: https://github.com/codeplaysoftware/tensorflow/tree/eigen_sycl |
Oh, I see, thanks, trying... |
vagrant@ubuntu-xenial:~/Project/tensorflow_eigen$ bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package /home/vagrant/.cache/bazel/_bazel_vagrant/e647697a348b187726950a371af92dd1/external/jpeg/BUILD:126:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg: |
It looks as the build system is trying to use arm architecture to build up, have no clue why... |
Ha this is a known issue with TF 1.6 and the recent versions of bazel. You have to use bazel 0.11.1 for our current version of TF. Make sure to manually remove the cache before compiling again. |
Thanks, trying... |
On a night in europe, hardly I think. Anyway, for the love of me, your dev environment seems just so much weird. And you are trying to build this, right? https://github.com/lukeiwanski/tensorflow/archive/dev/amd_gpu.zip |
Can you post the output of the "computecpp_info" tool located in the "bin" folder of the ComputeCpp release you are using? |
Hi, here is the output:
|
@mirh I was able compile TF using provided archive but it still shows just one GPU in TF. |
Guys, I was wondering if you provide payed support, I need to get TF working with all devices in my machine? The issue is highly critical for me and I'm willing to pay couple of hundred bucks to get the ball rolling. Is it possible somehow? |
@lu4 thanks for the report. It is some interesting rig you have there. So far our focus was on supporting systems with only one device - like one GPU and combinations of devices like CPU with one GPU and one other accelerator. It is quite complex to add support for multiple GPU - nevertheless, I believe we should do this. This task most likely will take some time - have you tried HiP? As of the paid support can you email me directly regarding that? |
@lu4 I have absolutely no idea if this will work, but when you create a tensorflow session try setting the SYCL device count in the session config options: import tensorflow as tf
with tf.Session(config=tf.ConfigProto(device_count={'SYCL': 8})) as sess:
print(sess.list_devices()) Even if this does allow TF to see all your devices I don't know if it will automatically schedule compute across all of them. It would be very interesting to hear the results of this. |
@jwlawson your trick worked, I was able to access all GPUs in my system, though it turns out that not everything works smooth for example eager execution is not able to get advantage of all the cards (it may be also due to misconfiguration), for some reason it just to binds with gpu:0 and does not want to use anything else. I'm continuing to investigate further and report on if will find anything useful. |
@lukeiwanski I've sent an email to you (used your github email [email protected]), JFYI |
@lu4 yes the email is correct.. however, I cannot find any email from you :( |
System information
Here's info from environment capture script:
Describe the problem
Tensorflow built on top of SYCL refuses to list and use all available GPUs in the system. I'm using the following commands to get list of devices:
(please note that TensorFlow's in-line log presents 8 devices, but the actual resulting variable contains just two CPU and one GPU available through "/device:SYCL:0" name)
I confirm that all devices are functional and available to OpenCL (visible to clinfo) and are operable through another 3-rd party package (ArrayFire). Also I confirm that SYCL itself sees all available devices, in order to test that purpose I've updated SYCL's 'custom-device-selector' example to following code:
The text was updated successfully, but these errors were encountered: