Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Build failure for gfx908 when building without optimization flags #1759

Open
LunNova opened this issue Dec 17, 2024 · 6 comments
Open

Comments

@LunNova
Copy link

LunNova commented Dec 17, 2024

Related to #1371

Problem Description

The develop branch 6ef8d3c fails to build with -DGPU_TARGETS="gfx908;gfx90a" -DCMAKE_CXX_FLAGS_RELEASE=' '

device_gemm_dpp_f16_f16_f16_km_nk_mn_instance

composable_kernel> [152/4044] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
composable_kernel> FAILED: library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
composable_kernel> /nix/store/wcfqfaalprfjgp0w9mx98fwdzwc82xq4-clr-6.3.0/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_FNUZ_FP8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/build/source/library/include -I/build/source/include -I/build/source/build/include -isystem /nix/store/wcfqfaalprfjgp0w9mx98fwdzwc82xq4-clr-6.3.0/include -parallel-jobs=1 -fgpu-inline-threshold=32768 -std=c++17 -fPIC   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-coerce-illegal-types=1 -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx908 --offload-arch=gfx90a -MD -MT library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o -c /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:23:21: error: not a valid operand.
composable_kernel>    23 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[0, 0, 0, 0, 0, 0, 0, 0]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[0, 0, 0, 0, 0, 0, 0, 0]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:27:21: error: not a valid operand.
composable_kernel>    27 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:31:21: error: not a valid operand.
composable_kernel>    31 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[2, 2, 2, 2, 2, 2, 2, 2]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[2, 2, 2, 2, 2, 2, 2, 2]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:35:21: error: not a valid operand.
composable_kernel>    35 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[3, 3, 3, 3, 3, 3, 3, 3]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[3, 3, 3, 3, 3, 3, 3, 3]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:39:21: error: not a valid operand.
composable_kernel>    39 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[4, 4, 4, 4, 4, 4, 4, 4]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[4, 4, 4, 4, 4, 4, 4, 4]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:43:21: error: not a valid operand.
composable_kernel>    43 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[5, 5, 5, 5, 5, 5, 5, 5]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[5, 5, 5, 5, 5, 5, 5, 5]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:47:21: error: not a valid operand.
composable_kernel>    47 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[6, 6, 6, 6, 6, 6, 6, 6]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[6, 6, 6, 6, 6, 6, 6, 6]
composable_kernel>       |                                 ^
composable_kernel> In file included from /build/source/library/src/tensor_operation_instance/gpu/gemm/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/device/impl/device_gemm_dpp.hpp:14:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/grid/gridwise_gemm_dpp.hpp:13:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/block/blockwise_gemm_dpp.hpp:9:
composable_kernel> In file included from /build/source/include/ck/tensor_operation/gpu/warp/dpp_gemm.hpp:6:
composable_kernel> In file included from /build/source/include/ck/utility/amd_gemm_dpp.hpp:8:
composable_kernel> /build/source/include/ck/utility/inner_product_dpp8.hpp:51:21: error: not a valid operand.
composable_kernel>    51 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[7, 7, 7, 7, 7, 7, 7, 7]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
composable_kernel>       |                     ^
composable_kernel> <inline asm>:2:33: note: instantiated into assembly here
composable_kernel>     2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[7, 7, 7, 7, 7, 7, 7, 7]
composable_kernel>       |                                 ^
composable_kernel> 8 errors generated when compiling for gfx908.
cmake flags:  -DCMAKE_C_FLAGS_RELEASE=-fgpu-inline-threshold=32768 -DCMAKE_CXX_FLAGS_RELEASE=-fgpu-inline-threshold=32768 -DCK_PARALLEL_LINK_JOBS=5 -DCK_PARALLEL_COMPILE_JOBS=62

This is on top of a ROCM 6.3 stack from the rocm-6.3.0 tags.

Operating System

NixOS

CPU

EPYC 7773

GPU

AMD Instinct MI100

@LunNova

This comment was marked as outdated.

@LunNova
Copy link
Author

LunNova commented Dec 17, 2024

It looks like device_gemm_dpp only supports gfx10/gfx11 and shouldn't be being built for gfx9:

static bool IsSupportedArgument(const Argument& karg)
{
if(ck::is_gfx103_supported() || ck::is_gfx11_supported())
{
return GridwiseGemm::CheckValidity(karg);
}
return false;
}

@LunNova
Copy link
Author

LunNova commented Dec 18, 2024

Docker reproduction steps on 6ef8d3c:

$ DOCKER_BUILDKIT=1 docker build -t ck:latest -f Dockerfile .
# wait a long time
$ docker run                                           \
    -it                                                \
    --privileged                                       \
    --group-add sudo                                   \
    -w /root/workspace                                 \
    -v ~/ck-docker-workspace/:/root/workspace  \
    ck:latest                                          \
    /bin/bash
# in docker shell now
$ git clone https://github.com/ROCm/composable_kernel.git composable_kernel
$ cd composable_kernel
$ git checkout 6ef8d3c295686b872d7e7a86621b68f765d98572 # latest develop commit on 2024-12-17
$ mkdir build && cd build
$ cmake -D CMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++            \
    -D CMAKE_BUILD_TYPE=Release -D GPU_ARCHS="gfx908;gfx90a" -DCMAKE_CXX_FLAGS_RELEASE=' ' ..
$ cmake --build .
# fails with error: not a valid operand
/root/workspace/composable_kernel/include/ck/utility/inner_product_dpp8.hpp:27:21: error: not a valid operand.
   27 |     asm volatile("\n v_dot2c_f32_f16_dpp %0, %1, %2 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]" : "=v"(c) : "v"(a), "v"(b), "0"(c));
      |                     ^
<inline asm>:2:33: note: instantiated into assembly here
    2 |  v_dot2c_f32_f16_dpp v2, v3, v4 dpp8:[1, 1, 1, 1, 1, 1, 1, 1]
      |                                 ^

I accidentally removed the default release flags from the release build by trying to use CMAKE_CXX_FLAGS_RELEASE to append flags, and that causes the failure. I shouldn't have been specifying that flag because it replaces rather than appending. I could have used the CXXFLAGS env var or CMAKE_CXX_FLAGS_INIT cmake flag instead.

-DCMAKE_CXX_FLAGS_RELEASE=' ' - fails
-DCMAKE_CXX_FLAGS_RELEASE='-O3 -DNDEBUG' - builds
no flag - builds

It's odd that the build requires these flags to succeed. Does this indicate an issue? Feel free to close if it's expected. I'm quite curious why missing optimization flags causes error: not a valid operand.

@LunNova LunNova changed the title [Issue]: Build failure for gfx908 when building for GPU_ARCH=gfx908;gfx90a [Issue]: Build failure for gfx908 when building without optimization flags Dec 18, 2024
@IMbackK
Copy link

IMbackK commented Dec 18, 2024

I can reproduce this issue.

@darren-amd
Copy link
Contributor

Hi @LunNova,

Thanks for reporting the issue, I was able to reproduce the error using your specified instructions. However, following the build instructions from the documentation, I was able to build CK without any errors. Could you please follow the build instructions available here, and let me know if you run into any issues? Thanks!

@LunNova
Copy link
Author

LunNova commented Dec 23, 2024

I am able to build normally, however it seems very odd for a build to rely on building with optimizations and worth investigating or documenting the cause of it failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants