Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ninja: build stopped: subcommand failed. #371

Open
bhargav25dave1996 opened this issue Oct 12, 2024 · 3 comments
Open

ninja: build stopped: subcommand failed. #371

bhargav25dave1996 opened this issue Oct 12, 2024 · 3 comments

Comments

@bhargav25dave1996
Copy link

bhargav25dave1996 commented Oct 12, 2024

Running this ColBERT code:

    config = ColBERTConfig(

        nbits=2,
        root="experiments",
    )
    indexer = Indexer(checkpoint="/media/sda1/Bhargav/indiccolbert/guj_Gujr-nllb1.3b-moses/colbert-50000", config=config)
    indexer.index(name="gu_fire.nbits=2", collection="/media/sda1/Bhargav/FIRE_adhoc_data/Gujarati/Gujarati_collection_only_index.tsv")

Gives me this error:

Process Process-2:
Traceback (most recent call last):
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process
return_val = callee(config, args)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 33, in encode
encoder.run(shared_lists)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 68, in run
self.train(shared_lists) # Trains centroids from selected passages
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 237, in train
bucket_cutoffs, bucket_weights, avg_residual = self._compute_avg_residual(centroids, heldout)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/collection_indexer.py", line 315, in _compute_avg_residual
compressor = ResidualCodec(config=self.config, centroids=centroids, avg_residual=None)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/residual.py", line 24, in init
ResidualCodec.try_load_torch_extensions(self.use_gpu)
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/residual.py", line 103, in try_load_torch_extensions
decompress_residuals_cpp = load(
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
write_ninja_file_and_build_library(
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in write_ninja_file_and_build_library
run_ninja_build(
File "/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'decompress_residuals_cpp': [1/3] c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o
FAILED: decompress_residuals.o
c++ -MMD -MF decompress_residuals.o.d -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp -o decompress_residuals.o
In file included from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:12,
from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:4,
from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cpp:1:
/home/irlab/miniconda3/envs/colbert/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory
44 | #include <crypt.h>
| ^~~~~~~~~
compilation terminated.
[2/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o
FAILED: decompress_residuals.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=decompress_residuals_cpp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/TH -isystem /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/THC -isystem /home/irlab/miniconda3/envs/colbert/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu -o decompress_residuals.cuda.o
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type::type>::cast_op_type pybind11::detail::cast_op(make_caster&)’:
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:120: error: expected template-name before ‘<’ token
42 | return caster.operator typename make_caster::template cast_op_type();
| ^
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:120: error: expected identifier before ‘<’ token
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:123: error: expected primary-expression before ‘>’ token
42 | return caster.operator typename make_caster::template cast_op_type();
| ^
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/pybind11/cast.h:42:126: error: expected primary-expression before ‘)’ token
42 | return caster.operator typename make_caster::template cast_op_type();
| ^
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu: In function ‘at::Tensor decompress_residuals_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int)’:
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu:61:126: warning: ‘T at::Tensor::data() const [with T = unsigned char]’ is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
61 | decompress_residuals_kernel<<<blocks, threads>>>(
| ^
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:238:1: note: declared here
238 | T * data() const {
| ^ ~~
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/colbert/indexing/codecs/decompress_residuals.cu:61:592: warning: ‘T at::Tensor::data() const [with T = c10::Half]’ is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
61 | decompress_residuals_kernel<<<blocks, threads>>>(
| ^
/home/irlab/miniconda3/envs/colbert/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:238:1: note: declared here
238 | T * data() const {
| ^ ~~
ninja: build stopped: subcommand failed.

@bhargav25dave1996
Copy link
Author

@okhat

@Liu-Eroteme
Copy link

Okay.. so I don't know what actually happened, but i got the same error today - and it turned out to be a torch fuckup.. the recent update broke some symlinks so fixing it was as easy as:

cd ...site-packages/torch/lib

ln -s ../../../../libtorch_python.so libtorch_python.so

.. tho, on second glance, your stack trace is a little different.. might be something else, but i'd still check torch and the torch extension loader, its always f*ing torch.

@HU-xiaobai
Copy link

hello, have you solved your questions? I meet the same problem. If you solve, could I ask how to solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants