Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

Open
krasznaa opened this issue Aug 3, 2024 · 4 comments
Open

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

krasznaa opened this issue Aug 3, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@krasznaa
Copy link
Member

krasznaa commented Aug 3, 2024

Unfortunately #275 seems to have introduced a failure in a setup that's not too easy to test in the CI. 😦 While building the latest code on Windows, with CUDA available, I bump into:

...
[build]   tmpxft_000062a0_00000000-7_test_cuda_jagged_vector_view_kernels.cudafe1.cpp
[build]   tmpxft_0000606c_00000000-7_test_cuda_edm_kernels.cudafe1.cpp
[build] C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(187): error C3861: '__memorder_vecmem_to_builtin': identifier not found [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(187): note: the template instantiation context (the oldest one first) is
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\tests\cuda\../common/jagged_soa_container_helpers.hpp(21): note: see reference to class template instantiation 'vecmem::testing::jagged_soa_interface<vecmem::edm::device<vecmem::edm::schema<vecmem::edm::type::scalar<int>,vecmem::edm::type::vector<float>,vecmem::edm::type::jagged_vector<double>,vecmem::edm::type::scalar<float>,vecmem::edm::type::jagged_vector<int>,vecmem::edm::type::vector<int>>>>' being compiled
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\tests\common\jagged_soa_container.hpp(18): note: see reference to class template instantiation 'vecmem::edm::device<vecmem::edm::schema<vecmem::edm::type::scalar<int>,vecmem::edm::type::vector<float>,vecmem::edm::type::jagged_vector<double>,vecmem::edm::type::scalar<float>,vecmem::edm::type::jagged_vector<int>,vecmem::edm::type::vector<int>>>' being compiled
[build]   C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/edm/device.hpp(121): note: see reference to class template instantiation 'vecmem::tuple<int *,vecmem::device_vector<TYPE>,vecmem::jagged_device_vector<double>,float *,vecmem::jagged_device_vector<int>,vecmem::device_vector<int>>' being compiled
[build]           with
[build]           [
[build]               TYPE=float
[build]           ]
...
[build] C:\Users\krasz\ATLAS\vecmem\vecmem\core\include\vecmem/memory/impl/device_atomic_ref.ipp(294): error C3861: '__atomic_fetch_add': identifier not found [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.6.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\HostX64\x64" -x cu   -IC:\Users\krasz\ATLAS\vecmem\vecmem\build\core\CMakeFiles -IC:\Users\krasz\ATLAS\vecmem\vecmem\core\include -IC:\Users\krasz\ATLAS\vecmem\vecmem\build\cuda\CMakeFiles -IC:\Users\krasz\ATLAS\vecmem\vecmem\cuda\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Users\krasz\ATLAS\vecmem\vecmem\build\_deps\googletest-src\googletest\include" -I"C:\Users\krasz\ATLAS\vecmem\vecmem\build\_deps\googletest-src\googletest" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include"     --keep-dir vecmem_test_cuda\x64\RelWithDebInfo  -maxrregcount=0   --machine 64 --compile -cudart static -std=c++14 --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler="/EHsc -Zi -Ob1 /Zc:__cplusplus"   -D_WINDOWS -DNDEBUG -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_SOURCE_DIR_LENGTH=35 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -D"CMAKE_INTDIR=\"RelWithDebInfo\"" -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_SOURCE_DIR_LENGTH=35 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -D"CMAKE_INTDIR=\"RelWithDebInfo\"" -Xcompiler "/EHsc /W4 /nologo /O2 /FS /Zi  /MD /GR" -Xcompiler "/Fdvecmem_test_cuda.dir\RelWithDebInfo\vc143.pdb" -o vecmem_test_cuda.dir\RelWithDebInfo\test_cuda_edm_kernels.obj "C:\Users\krasz\ATLAS\vecmem\vecmem\tests\cuda\test_cuda_edm_kernels.cu"" exited with code 2. [C:\Users\krasz\ATLAS\vecmem\vecmem\build\tests\cuda\vecmem_test_cuda.vcxproj]

So nvcc is trying to compile test_cuda_edm_kernels.cu, and gets confused about how to use vecmem::device_atomic_ref.

The preprocessor flags will need to be tweaked to fix this. 🤔

@krasznaa krasznaa added the bug Something isn't working label Aug 3, 2024
@stephenswat
Copy link
Member

Hmm, I wonder if the following preprocessor condition fires on MSVC:

#elif defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) || defined(__CUDACC__)

If so, that makes sense. I know that clang sets __GNUC__, so MSVC might do the same.

All other uses of both __memorder_vecmem_to_builtin and __atomic_fetch_add are guarded by the VECMEM_HAVE_MEMORDER_DEFINITIONS and VECMEM_HAVE_BUILTIN_ATOMIC_FETCH_ADD, which should be defined correctly... 🤔

@stephenswat stephenswat changed the title Atomic Failiure on Windows with CUDA, main branch (2024.08.03.) Atomic Failure on Windows with CUDA, main branch (2024.08.03.) Aug 5, 2024
@stephenswat
Copy link
Member

Hmm, no it doesn't: https://godbolt.org/z/o65TPMGPa

@stephenswat
Copy link
Member

Okay, that's super weird. I'm not really familiar with how CUDA compilation works with MSVC, but the line indicated in the error (187) is the following:

#elif defined(VECMEM_HAVE_BUILTIN_ATOMIC_LOAD) && \
    defined(VECMEM_HAVE_MEMORDER_DEFINITIONS)
    return __atomic_load_n(m_ptr, __memorder_vecmem_to_builtin(order));

where VECMEM_HAVE_BUILTIN_ATOMIC_LOAD is defined iff:

#if defined __has_builtin
#if __has_builtin(__atomic_load_n)
#define VECMEM_HAVE_BUILTIN_ATOMIC_LOAD
#endif
#endif

and VECMEM_HAVE_MEMORDER_DEFINITIONS is defined at the same time as the __memorder_vecmem_to_builtin function:

#if defined(__ATOMIC_RELAXED) && defined(__ATOMIC_CONSUME) && \
    defined(__ATOMIC_ACQUIRE) && defined(__ATOMIC_RELEASE) && \
    defined(__ATOMIC_ACQ_REL) && defined(__ATOMIC_SEQ_CST)
constexpr int __memorder_vecmem_to_builtin(memory_order o) {
    switch (o) {
        case memory_order::relaxed:
            return __ATOMIC_RELAXED;
        case memory_order::consume:
            return __ATOMIC_CONSUME;
        case memory_order::acquire:
            return __ATOMIC_ACQUIRE;
        case memory_order::release:
            return __ATOMIC_RELEASE;
        case memory_order::acq_rel:
            return __ATOMIC_ACQ_REL;
        case memory_order::seq_cst:
            return __ATOMIC_SEQ_CST;
        default:
            assert(false);
            return 0;
    }
}
#define VECMEM_HAVE_MEMORDER_DEFINITIONS
#endif

Does MSVC + CUDA work differently than on Linux, where the nvcc compiler driver causes the preprocessor definitions to be evaluated twice; once for the host code and the device code? If this differs somehow, that could explain the problem.

@stephenswat
Copy link
Member

Nevermind, I think I know what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants