-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random CUFFT_INTERNAL_ERROR #41
Comments
I’ve never seen this before… maybe try bumping cuda or PyTorch versions?
Torch FFT occasionally has errors since it’s rarely used. If you can get a
minimal reproduction script in your environment I think the PyTorch folks
would appreciate a bug report (Once I found a bug in PyTorch fft backwards
pass over a Christmas break)
…On Tue, Dec 24, 2024 at 3:31 PM Eduard Zorita ***@***.***> wrote:
Hi, we're testing m2-bert-80M-32k-retrieval and running in inference we
randomly get the following error:
outputs = self.model(**input_dict)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/bert_layers.py", line 956, in forward
outputs = self.bert(
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/bert_layers.py", line 528, in forward
encoder_outputs = self.encoder(
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/bert_layers.py", line 371, in forward
hidden_states = layer_module(hidden_states,
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/bert_layers.py", line 280, in forward
attention_output = self.attention(hidden_states)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/monarch_mixer_sequence_mixer.py", line 129, in forward
y = self.filter_fn(v, L, k_fwd=k, k_rev=k_rev, bias= self.filter_fn.bias[None, :, None])
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ezorita/.cache/pypoetry/virtualenvs/ml-benchmarks-NBkZU-eG-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/hyena_utils.py", line 251, in forward
y = fftconv_ref(
File "/home/ezorita/.cache/huggingface/modules/transformers_modules/togethercomputer/m2-bert-80M-32k-retrieval/a2ccdc5b5661a282c77545e586a019f387ab7a48/hyena_utils.py", line 42, in fftconv_ref
u_f = torch.fft.rfft(u.to(dtype=k.dtype), n=fft_size)
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
Any ideas?
—
Reply to this email directly, view it on GitHub
<#41>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDIIVFXVSPEHNCPQCVN632HFV4JAVCNFSM6AAAAABUEZPVFWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG42TOOBXGQ3DIOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, we're testing
m2-bert-80M-32k-retrieval
and running in inference we randomly get the following error:Any ideas?
The text was updated successfully, but these errors were encountered: