Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile Issue #5

Open
huangyuxiang03 opened this issue Nov 29, 2024 · 1 comment
Open

Compile Issue #5

huangyuxiang03 opened this issue Nov 29, 2024 · 1 comment

Comments

@huangyuxiang03
Copy link

Hi,
Thanks for providing this implementation. When we were trying to install this on A800 GPUs, we encountered this error:

[61/61] /home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
FAILED: /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o 
/home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
sh: line 1: 29645 Killed                  ptxas -arch=sm_90 -m64 --generate-line-info "/tmp/tmpxft_00005048_00000000-6_flash_fwd_split_hdim64_fp16_sm80.compute_90.ptx" -o "/tmp/tmpxft_00005048_00000000-11_flash_fwd_split_hdim64_fp16_sm80.compute_90.cubin" > /tmp/tmpxft_00005048_00000000-13_2d74fb0_stdout 2> /tmp/tmpxft_00005048_00000000-13_2d74fb0_stderr
ninja: build stopped: subcommand failed.

The compilation stuck at [61/61] for a very long time, before it is kill by the os. What could be the potential problem?
Thanks.

@qijiaxing
Copy link

I also met an issue of compilation stuck. I solve it by adding the env before compilation MAX_JOBS=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants