-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #50
Comments
Had more time to test. The readme is wrong to reference the medium article that changes the head. I used the package and it worked fine. One minor change is pytorch 1.9.1 throws a CUDA error: no kernal image available for execution on the device. Switched to 1.8.2 and it worked fine. The gpu usage is really low though like 7% during training. Is that normal? |
I spoke too soon. It was training fine for hours then:
|
I fixed this, and I'm running a very long test to see if there's any other issues. My fix was to go into the S_symmetry.py and S_separability.py and before the lines with if is_cuda and isinstance(min_error, torch.Tensor):
min_error = min_error.cpu() When I'm back in Windows I'll do a pull request with the changes. I will say though I don't program python or torch, so does this fix look right? From what I can tell the min_error is on the GPU with CUDA and needs to be moved back before numpy can work with it. @SJ001 do you not get this issue when using CUDA? Seems strange that I'm the only one that sees this bug unless everyone else is using their CPUs? |
I let it run with my dataset and after hours it hangs forever at this point:
....
If you need more data just ask. |
Thanks for documenting this. The package could be useful if we can work through these issues. |
I just created a fresh Ubuntu 20.04.3 LTS install and installed drivers and checked that pytorch was using CUDA and everything seems fine.
It's in S_run_aifenman.py line 85:
idx_min = np.argmin(np.array([symmetry_plus_result.......
This error occurs after all the brute force lines. I'm not familiar with numpy or pytorch, so hopefully this is an obvious error on my part? This is the command I used to get pytorch
Then I followed that notebook on the linked site on the README. I'm using my own data that is similar to their example. Did something perhaps change recently that would cause this error with the code? Do I need to use an old pytorch? (I have a 3090 that I'm using for reference since I believe I need to use CUDA 11.1 or higher).
The text was updated successfully, but these errors were encountered: