-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalNormalizedCrossCorrelationLoss and TF32 numerical stability #6525
Comments
After some investigations, I believe this is probably not a numerical stability issue. See https://dev-discuss.pytorch.org/t/pytorch-and-tensorfloat32/504 for more similar issues. The issue is NGC container sets the default value Maybe we should mention the tf32 issue to the users somewhere in the documentation? |
thanks, before we have a workaround, perhaps adding a warning message in the constructor when the flag is True? MONAI/monai/losses/image_dissimilarity.py Line 90 in 42e3674
|
It is okay to add a warning for the loss. However, my larger concern is that other operations in monai will be also affected by the tf32 issue (since all operations uses My proposal is adding something like |
By checking https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/layers Tag 23.06-py3, LAYERS: 08aa16a90c shows that |
thanks @qingpeng9802, I think that's a very good point and I create a separate ticket to track the features #6754. |
Describe the bug
Follow-up of Project-MONAI/tutorials#1336, depending on the cuDNN version and GPU mode,
LocalNormalizedCrossCorrelationLoss
running with low precision operations may not be numerically stable.The current workaround is with
torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.allow_tf32 = False
. It would be great to improve the stability in general.The text was updated successfully, but these errors were encountered: