We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, Thanks for providing this implementation. When we were trying to install this on A800 GPUs, we encountered this error:
[61/61] /home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc FAILED: /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o /home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc sh: line 1: 29645 Killed ptxas -arch=sm_90 -m64 --generate-line-info "/tmp/tmpxft_00005048_00000000-6_flash_fwd_split_hdim64_fp16_sm80.compute_90.ptx" -o "/tmp/tmpxft_00005048_00000000-11_flash_fwd_split_hdim64_fp16_sm80.compute_90.cubin" > /tmp/tmpxft_00005048_00000000-13_2d74fb0_stdout 2> /tmp/tmpxft_00005048_00000000-13_2d74fb0_stderr ninja: build stopped: subcommand failed.
The compilation stuck at [61/61] for a very long time, before it is kill by the os. What could be the potential problem? Thanks.
The text was updated successfully, but these errors were encountered:
I also met an issue of compilation stuck. I solve it by adding the env before compilation MAX_JOBS=1
MAX_JOBS=1
Sorry, something went wrong.
No branches or pull requests
Hi,
Thanks for providing this implementation. When we were trying to install this on A800 GPUs, we encountered this error:
The compilation stuck at [61/61] for a very long time, before it is kill by the os. What could be the potential problem?
Thanks.
The text was updated successfully, but these errors were encountered: