[Issue]: Build failure for gfx908 when building without optimization flags #1759

LunNova · 2024-12-17T17:39:36Z

Related to #1371

Problem Description

The develop branch 6ef8d3c fails to build with -DGPU_TARGETS="gfx908;gfx90a" -DCMAKE_CXX_FLAGS_RELEASE=' '

device_gemm_dpp_f16_f16_f16_km_nk_mn_instance

composable_kernel> [152/4044] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
composable_kernel> FAILED: library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
composable_kernel> /nix/store/wcfqfaalprfjgp0w9mx98fwdzwc82xq4-clr-6.3.0/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_FNUZ_FP8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/build/source/library/include -I/build/source/include -I/build/source/build/include -isystem /nix/store/wcfqfaalprfjgp0w9mx98fwdzwc82xq4-clr-6.3.0/include -parallel-jobs=1 -fgpu-inline-threshold=32768 -std=c++17 -fPIC   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx908 --offload-arch=gfx90a -MD -MT library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o -c /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:23:21: error: not a valid operand.
composable_kernel>    23 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[0, 0, 0, 0, 0, 0, 0, 0]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[0, 0, 0, 0, 0, 0, 0, 0]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:27:21: error: not a valid operand.
composable_kernel>    27 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:31:21: error: not a valid operand.
composable_kernel>    31 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[2, 2, 2, 2, 2, 2, 2, 2]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[2, 2, 2, 2, 2, 2, 2, 2]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:35:21: error: not a valid operand.
composable_kernel>    35 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[3, 3, 3, 3, 3, 3, 3, 3]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[3, 3, 3, 3, 3, 3, 3, 3]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:39:21: error: not a valid operand.
composable_kernel>    39 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[4, 4, 4, 4, 4, 4, 4, 4]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[4, 4, 4, 4, 4, 4, 4, 4]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:43:21: error: not a valid operand.
composable_kernel>    43 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[5, 5, 5, 5, 5, 5, 5, 5]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[5, 5, 5, 5, 5, 5, 5, 5]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:47:21: error: not a valid operand.
composable_kernel>    47 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[6, 6, 6, 6, 6, 6, 6, 6]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[6, 6, 6, 6, 6, 6, 6, 6]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:51:21: error: not a valid operand.
composable_kernel>    51 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[7, 7, 7, 7, 7, 7, 7, 7]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[7, 7, 7, 7, 7, 7, 7, 7]
composable_kernel>       |                                 ^
composable_kernel> 8 errors generated when compiling for gfx908.

cmake flags:  -DCMAKE_C_FLAGS_RELEASE=-fgpu-inline-threshold=32768 -DCMAKE_CXX_FLAGS_RELEASE=-fgpu-inline-threshold=32768 -DCK_PARALLEL_LINK_JOBS=5 -DCK_PARALLEL_COMPILE_JOBS=62

This is on top of a ROCM 6.3 stack from the rocm-6.3.0 tags.

Operating System

NixOS

CPU

EPYC 7773

GPU

AMD Instinct MI100

The text was updated successfully, but these errors were encountered:

LunNova · 2024-12-17T18:34:43Z

It looks like device_gemm_dpp only supports gfx10/gfx11 and shouldn't be being built for gfx9:

composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp

Lines 169 to 176 in 689a5ae

    
           static bool IsSupportedArgument(const Argument& karg) 
        
           { 
        
               if(ck::is_gfx103_supported() || ck::is_gfx11_supported()) 
        
               { 
        
                   return GridwiseGemm::CheckValidity(karg); 
        
               } 
        
               return false; 
        
           }

device_gemm_dpp only supports gfx10/gfx11 and will fail to build for gfx9. https://github.com/ROCm/composable_kernel/blob/689a5ae45be802f51fc947a9f92208dcfb143f77/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp#L169-L176 Fixes ROCm#1759

LunNova · 2024-12-18T00:34:53Z

Docker reproduction steps on 6ef8d3c:

$ DOCKER_BUILDKIT=1 docker build -t ck:latest -f Dockerfile .
# wait a long time
$ docker run                                           \
    -it                                                \
    --privileged                                       \
    --group-add sudo                                   \
    -w /root/workspace                                 \
    -v ~/ck-docker-workspace/:/root/workspace  \
    ck:latest                                          \
    /bin/bash
# in docker shell now
$ git clone https://github.com/ROCm/composable_kernel.git composable_kernel
$ cd composable_kernel
$ git checkout 6ef8d3c295686b872d7e7a86621b68f765d98572 # latest develop commit on 2024-12-17
$ mkdir build && cd build
$ cmake -D CMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++            \
    -D CMAKE_BUILD_TYPE=Release -D GPU_ARCHS="gfx908;gfx90a" -DCMAKE_CXX_FLAGS_RELEASE=' ' ..
$ cmake --build .
# fails with error: not a valid operand
/root/workspace/composable_kernel/include/ck/utility/inner_product_dpp8.hpp:27:21: error: not a valid operand.
   27 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
      |                     ^
<inline asm>:2:33: note: instantiated into assembly here
    2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]
      |                                 ^

I accidentally removed the default release flags from the release build by trying to use CMAKE_CXX_FLAGS_RELEASE to append flags, and that causes the failure. I shouldn't have been specifying that flag because it replaces rather than appending. I could have used the CXXFLAGS env var or CMAKE_CXX_FLAGS_INIT cmake flag instead.

-DCMAKE_CXX_FLAGS_RELEASE=' ' - fails
-DCMAKE_CXX_FLAGS_RELEASE='-O3 -DNDEBUG' - builds
no flag - builds

It's odd that the build requires these flags to succeed. Does this indicate an issue? Feel free to close if it's expected. I'm quite curious why missing optimization flags causes error: not a valid operand.

IMbackK · 2024-12-18T16:26:40Z

I can reproduce this issue.

darren-amd · 2024-12-23T15:52:53Z

Hi @LunNova,

Thanks for reporting the issue, I was able to reproduce the error using your specified instructions. However, following the build instructions from the documentation, I was able to build CK without any errors. Could you please follow the build instructions available here, and let me know if you run into any issues? Thanks!

LunNova · 2024-12-23T15:55:16Z

I am able to build normally, however it seems very odd for a build to rely on building with optimizations and worth investigating or documenting the cause of it failing.

This comment was marked as outdated.

Sign in to view

LunNova mentioned this issue Dec 17, 2024

disable _dpp instances for non-gfx10/gfx11 devices #1761

Closed

7 tasks

LunNova changed the title ~~[Issue]: Build failure for gfx908 when building for GPU_ARCH=gfx908;gfx90a~~ [Issue]: Build failure for gfx908 when building without optimization flags Dec 18, 2024

ppanchad-amd added the Under Investigation label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Build failure for gfx908 when building without optimization flags #1759

[Issue]: Build failure for gfx908 when building without optimization flags #1759

LunNova commented Dec 17, 2024 •

edited

Loading

This comment was marked as outdated.

LunNova commented Dec 17, 2024

LunNova commented Dec 18, 2024 •

edited

Loading

IMbackK commented Dec 18, 2024

darren-amd commented Dec 23, 2024

LunNova commented Dec 23, 2024

[Issue]: Build failure for gfx908 when building without optimization flags #1759

[Issue]: Build failure for gfx908 when building without optimization flags #1759

Comments

LunNova commented Dec 17, 2024 • edited Loading

Problem Description

Operating System

CPU

GPU

This comment was marked as outdated.

LunNova commented Dec 17, 2024

LunNova commented Dec 18, 2024 • edited Loading

IMbackK commented Dec 18, 2024

darren-amd commented Dec 23, 2024

LunNova commented Dec 23, 2024

LunNova commented Dec 17, 2024 •

edited

Loading

LunNova commented Dec 18, 2024 •

edited

Loading