Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cublas computation failed, CUDA_ERROR_ILLEGAL_ADDRESS #2

Open
maunzzz opened this issue Oct 22, 2015 · 6 comments
Open

cublas computation failed, CUDA_ERROR_ILLEGAL_ADDRESS #2

maunzzz opened this issue Oct 22, 2015 · 6 comments

Comments

@maunzzz
Copy link

maunzzz commented Oct 22, 2015

Hi!
I'm using your code for 3D segmentation tasks at the moment and I would like to start off by thanking you for making it available.

I also have a small issue, when a matlab script has crashed (or been stopped) and I try to start it again I get the following error message when trying to use mex_conv3d

"Error using mex_conv3d
cublas computation failed."

Previous to that I get several of these warnings.

"Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS "

Do you have any idea on how to fix this?
Thank you!
Regards,
Måns

@pengsun
Copy link
Owner

pengsun commented Oct 23, 2015

Hi, Mans, thanks for your response! Can you pass the unit test on your machine? Also, can you post or send me the minimal code reproducing this error? Or I cannot figure out what's wrong with the code by just checking these messages...

@omair-kg
Copy link

Hi @pengsun , I have the same problem. I run the code once and it runs alright. Next time when i try to run it it gives the error

"Error using mex_conv3d
cublas computation failed."

This also happens if I terminate the script mid way through execution.

@maunzzz
Copy link
Author

maunzzz commented Jul 22, 2016

Sorry for the extremely long delay, been busy with other stuff.
The unit tests run fine for me, I've managed to reproduce the error (or something very similar) by running this code:

`run('setup_path.m');

A = rand([10 10 10 10],'single');
w1 = rand([3 3 3 10 10],'single');
w2 = rand([1 10],'single');

gpu = gpuDevice();

A_gpu = gpuArray(A);
w1_gpu = gpuArray(w1);
w2_gpu = gpuArray(w2);
res1 = mex_conv3d(A_gpu, w1_gpu, w2_gpu,'pad', 0, 'stride', 1);

fprintf('First one works \n')

reset(gpu);

A_gpu = gpuArray(A);
w1_gpu = gpuArray(w1);
w2_gpu = gpuArray(w2);

res2 = mex_conv3d(A_gpu, w1_gpu, w2_gpu,'pad', 0, 'stride', 1);`

I get the following output:

`First one works
Error using mex_conv3d
cublas computation failed.

Error in RecreateMatconv3dErr (line 24)
res2 = mex_conv3d(A_gpu, w1_gpu, w2_gpu,'pad', 0, 'stride', 1);`

It seems like resetting the gpu and then trying to run again causes a crash,
There also seems to be some memory problems for me when I run the code, after performing a forward or backward with mex_conv3d the matlab instance takes up way more memory than it should (clearing related variables doesn't help). Maybe these two errors are related.

\Måns

@abursuc
Copy link

abursuc commented Aug 14, 2016

Hi,

I have bumped into the same problem: running some code, stopping it, reseting gpu, run code again -> "Error using mex_conv3d. cublas computation failed."
The workaround is to close matlab and open it again. The code works well after that.
I could not find the source of this issue unfortunately.

@pengsun
Copy link
Owner

pengsun commented Sep 5, 2016

Hi All,

So sorry for the late reply... I've found out the cause and am not sure whether I should "fix" this "bug".

The quick fix is to always call

clear mex_conv3d

before resetting GPU. For example, in the code provided by @maunzzz, we could do this:

run('setup_path.m');

A = rand([10 10 10 10],'single');
w1 = rand([3 3 3 10 10],'single');
w2 = rand([1 10],'single');

gpu = gpuDevice();

A_gpu = gpuArray(A);
w1_gpu = gpuArray(w1);
w2_gpu = gpuArray(w2);
res1 = mex_conv3d(A_gpu, w1_gpu, w2_gpu,'pad', 0, 'stride', 1);

fprintf('First one works \n')

clear mex_conv3d; % <-- add this line BEFORE resetting GPU
reset(gpu);

A_gpu = gpuArray(A);
w1_gpu = gpuArray(w1);
w2_gpu = gpuArray(w2);

res2 = mex_conv3d(A_gpu, w1_gpu, w2_gpu,'pad', 0, 'stride', 1);

And everything works fine.

Here is the explanation. Inside mex_conv3d, a handle to cublas library is held. It becomes invalid if the GPU status is cleared "outside" (e.g., the GPU is reset by calling reset() function in Matlab) and the mex_conv3d is unaware of this change. On the other hand, it lacks an economical cublas function to test the validity of the handle and it is too expensive to use lazy-cublas-initialization when calling cublas function like cublasSgemm by checking its returned status. So I guess remembering "clear mex_conv3d" should be the best way (?).

Let me know if it solves the problem!

@ityer82
Copy link

ityer82 commented Aug 6, 2018

H pengsun, I also got the same problem. I tried your solution, but as soon as write clear mex_conv3d, matlab collapses.. any suggestions to solve this out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants