Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

krasznaa · 2024-08-03T08:41:51Z

Unfortunately #275 seems to have introduced a failure in a setup that's not too easy to test in the CI. 😦 While building the latest code on Windows, with CUDA available, I bump into:

...
[build]   tmpxft_000062a0_00000000-7_test_cuda_jagged_vector_view_kernels.cudafe1.cpp
[build]   tmpxft_0000606c_00000000-7_test_cuda_edm_kernels.cudafe1.cpp
[build] C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(187): error C3861: '__memorder_vecmem_to_builtin': identifier not found [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(187): note: the template instantiation context (the oldest one first) is
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\tests\cuda\../common/jagged_soa_container_helpers.hpp(21): note: see reference to class template instantiation 'vecmem::testing::jagged_soa_interface<vecmem::edm::device<vecmem::edm::schema<vecmem::edm::type::scalar<int>,vecmem::edm::type::vector<float>,vecmem::edm::type::jagged_vector<double>,vecmem::edm::type::scalar<float>,vecmem::edm::type::jagged_vector<int>,vecmem::edm::type::vector<int>>>>' being compiled
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\tests\common\jagged_soa_container.hpp(18): note: see reference to class template instantiation 'vecmem::edm::device<vecmem::edm::schema<vecmem::edm::type::scalar<int>,vecmem::edm::type::vector<float>,vecmem::edm::type::jagged_vector<double>,vecmem::edm::type::scalar<float>,vecmem::edm::type::jagged_vector<int>,vecmem::edm::type::vector<int>>>' being compiled
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/edm/device.hpp(121): note: see reference to class template instantiation 'vecmem::tuple<int *,vecmem::device_vector<TYPE>,vecmem::jagged_device_vector<double>,float *,vecmem::jagged_device_vector<int>,vecmem::device_vector<int>>' being compiled
[build]           with
[build]           [
[build]               TYPE=float
[build]           ]
...
[build] C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(294): error C3861: '__atomic_fetch_add': identifier not found [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.6.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -IC:\Users\krasz\ATLAS\vecmem\vecmem\build\core\CMakeFiles -IC:\Users\krasz\ATLAS\vecmem\vecmem\core\include -IC:\Users\krasz\ATLAS\vecmem\vecmem\build\cuda\CMakeFiles -IC:\Users\krasz\ATLAS\vecmem\vecmem\cuda\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Users\krasz\ATLAS\vecmem\vecmem\build\_deps\googletest-src\googletest\include" -I"C:\Users\krasz\ATLAS\vecmem\vecmem\build\_deps\googletest-src\googletest" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include"     --keep-dir vecmem_test_cuda\x64\RelWithDebInfo  -maxrregcount=0   --machine 64 --compile -cudart static -std=c++14 --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler="/EHsc -Zi -Ob1 /Zc:__cplusplus"   -D_WINDOWS -DNDEBUG -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_SOURCE_DIR_LENGTH=35 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -D"CMAKE_INTDIR=\"RelWithDebInfo\"" -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_SOURCE_DIR_LENGTH=35 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -D"CMAKE_INTDIR=\"RelWithDebInfo\"" -Xcompiler "/EHsc /W4 /nologo /O2 /FS /Zi  /MD /GR" -Xcompiler "/Fdvecmem_test_cuda.dir\RelWithDebInfo\vc143.pdb" -o vecmem_test_cuda.dir\RelWithDebInfo\test_cuda_edm_kernels.obj "C:\Users\krasz\ATLAS\vecmem\vecmem\tests\cuda\test_cuda_edm_kernels.cu"" exited with code 2. [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]

So nvcc is trying to compile test_cuda_edm_kernels.cu, and gets confused about how to use vecmem::device_atomic_ref.

The preprocessor flags will need to be tweaked to fix this. 🤔

The text was updated successfully, but these errors were encountered:

stephenswat · 2024-08-05T08:59:09Z

Hmm, I wonder if the following preprocessor condition fires on MSVC:

#elif defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) || defined(__CUDACC__)

If so, that makes sense. I know that clang sets __GNUC__, so MSVC might do the same.

All other uses of both __memorder_vecmem_to_builtin and __atomic_fetch_add are guarded by the VECMEM_HAVE_MEMORDER_DEFINITIONS and VECMEM_HAVE_BUILTIN_ATOMIC_FETCH_ADD, which should be defined correctly... 🤔

stephenswat · 2024-08-05T09:01:14Z

Hmm, no it doesn't: https://godbolt.org/z/o65TPMGPa

stephenswat · 2024-08-05T09:05:20Z

Okay, that's super weird. I'm not really familiar with how CUDA compilation works with MSVC, but the line indicated in the error (187) is the following:

#elif defined(VECMEM_HAVE_BUILTIN_ATOMIC_LOAD) && \
    defined(VECMEM_HAVE_MEMORDER_DEFINITIONS)
    return __atomic_load_n(m_ptr, __memorder_vecmem_to_builtin(order));

where VECMEM_HAVE_BUILTIN_ATOMIC_LOAD is defined iff:

#if defined __has_builtin
#if __has_builtin(__atomic_load_n)
#define VECMEM_HAVE_BUILTIN_ATOMIC_LOAD
#endif
#endif

and VECMEM_HAVE_MEMORDER_DEFINITIONS is defined at the same time as the __memorder_vecmem_to_builtin function:

#if defined(__ATOMIC_RELAXED) && defined(__ATOMIC_CONSUME) && \
    defined(__ATOMIC_ACQUIRE) && defined(__ATOMIC_RELEASE) && \
    defined(__ATOMIC_ACQ_REL) && defined(__ATOMIC_SEQ_CST)
constexpr int __memorder_vecmem_to_builtin(memory_order o) {
    switch (o) {
        case memory_order::relaxed:
            return __ATOMIC_RELAXED;
        case memory_order::consume:
            return __ATOMIC_CONSUME;
        case memory_order::acquire:
            return __ATOMIC_ACQUIRE;
        case memory_order::release:
            return __ATOMIC_RELEASE;
        case memory_order::acq_rel:
            return __ATOMIC_ACQ_REL;
        case memory_order::seq_cst:
            return __ATOMIC_SEQ_CST;
        default:
            assert(false);
            return 0;
    }
}
#define VECMEM_HAVE_MEMORDER_DEFINITIONS
#endif

Does MSVC + CUDA work differently than on Linux, where the nvcc compiler driver causes the preprocessor definitions to be evaluated twice; once for the host code and the device code? If this differs somehow, that could explain the problem.

stephenswat · 2024-08-05T09:08:00Z

Nevermind, I think I know what's going on.

krasznaa added the bug Something isn't working label Aug 3, 2024

krasznaa assigned stephenswat and krasznaa Aug 3, 2024

stephenswat changed the title ~~Atomic Failiure on Windows with CUDA, main branch (2024.08.03.)~~ Atomic Failure on Windows with CUDA, main branch (2024.08.03.) Aug 5, 2024

krasznaa mentioned this issue Aug 8, 2024

Atomic Reference Reorganization, main branch (2024.08.08.) #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

krasznaa commented Aug 3, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

Comments

krasznaa commented Aug 3, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024

stephenswat commented Aug 5, 2024