Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA providers failed to build against 12.6 with error error #221-D #22728

Open
egortech opened this issue Nov 5, 2024 · 4 comments
Open

CUDA providers failed to build against 12.6 with error error #221-D #22728

egortech opened this issue Nov 5, 2024 · 4 comments
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider

Comments

@egortech
Copy link

egortech commented Nov 5, 2024

Describe the issue

CUDA providers failed to build against 12.6 with error error #221-D.

Urgency

No response

Target platform

Windows 11

Build script

./build.bat --config RelWithDebInfo --use_openvino AUTO:GPU,CPU --use_cuda --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6" --use_tensorrt --tensorrt_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\TensorRT-10.6.0.26" --build_shared_lib --cuda_version "12.6" --cmake_generator "Visual Studio 17 2022" --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='50;52;61;70;72;75;80;86;87;89;90' --cudnn_home "C:\Program Files\NVIDIA\CUDNN" --parallel --cmake_extra_defines CUDNN_INCLUDE_DIR="C:\Program Files\NVIDIA\CUDNN\include" CUDNN_LIBRARY="C:\Program Files\NVIDIA\CUDNN\lib\x64\cudnn.lib"

Error / output

E:\src\Microsoft\onnxruntime\onnxruntime\core/providers/cuda/shared_inc/cuda_utils.h(159): error #221-D: floating-point value does not fit in required floating-point type [E:\src\Microsoft\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
        return ((float)(1e+300));
                ^

E:\src\Microsoft\onnxruntime\onnxruntime\core/providers/cuda/shared_inc/cuda_utils.h(166): error #221-D: floating-point value does not fit in required floating-point type [E:\src\Microsoft\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
        return -((double)((float)(1e+300)));
                          ^

E:\src\Microsoft\onnxruntime\onnxruntime\core/providers/cuda/shared_inc/cuda_utils.h(169): error #221-D: floating-point value does not fit in required floating-point type [E:\src\Microsoft\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_providers_cuda.vcxproj]
        return ((double)((float)(1e+300)));
                         ^

  4 errors detected in the compilation of "E:/src/Microsoft/onnxruntime/onnxruntime/contrib_ops/cuda/sparse/sparse_attention_impl.cu".
  sparse_attention_impl.cu

Visual Studio Version

Visual Studio 2022 v17.12 Preview 5

GCC / Compiler Version

No response

@egortech egortech added the build build issues; typically submitted using template label Nov 5, 2024
@egortech egortech changed the title [Build] CUDA providers failed to build against 12.6 with error error #221-D. [Build] Nov 5, 2024
@egortech egortech changed the title CUDA providers failed to build against 12.6 with error error #221-D. [Build] CUDA providers failed to build against 12.6 with error error #221-D Nov 5, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 5, 2024
@snnn
Copy link
Member

snnn commented Nov 5, 2024

I can confirm the problem exists. We should directly use std::numeric_limits instead. I tried to update the code but I cannot understand it. It mix uses of INFINITY and max. It also uses -INFINITY which usually is not meanful. @tianleiwu / @yufenglee, do you have time to take a look?

@snnn
Copy link
Member

snnn commented Nov 5, 2024

PR #13594

@tianleiwu
Copy link
Contributor

Thanks for reporting. Let me create a PR to fix it.

@snnn
Copy link
Member

snnn commented Nov 5, 2024

Thank you @tianleiwu

tianleiwu added a commit that referenced this issue Nov 6, 2024
### Description
* Fix `NumericLimits<float>` that used infinity as max, which is not
consistent with `std::numeric_limits<float>::max()`
In Windows, (float)(1e+300) is used for INFINITY, which causes compiler
error in Visual Studio 2022 v17.12 Preview 5.
* Rename `NumericLimits<T>::Min` to Lowest to be consistent with
std::numeric_limits
* Fix topk implementation: use `NumericLimits<CudaT>` instead of
`NumericLimits<T>` in kernel. That could avoid defining a confusing
defintion of `NumericLimits<MLFloat16>` that returns half instead of
MLFloat16.
* Use CUDART_MAX_NORMAL_FP16 if possible. It sets bits value directly,
which is faster than converting float to half.

Note that NumericLimits does not support __nv_bfloat16 and _nv_fp8_e4m3
and __nv_fp8_e5m2 right now.

### Motivation and Context
#22728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants