A sparse tensor bug #1

caijillx · 2022-04-23T12:21:35Z

ubuntu18.04
RTX3090
cuda11.1
MinkowskiEngine 0.5.4

The following error occurred when I tried to run your model。

(RegTR) ➜ src git:(main) ✗ python test.py --dev --resume ../trained_models/3dmatch/ckpt/model-best.pth --benchmark 3DMatch

/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
04/23 20:06:22 [INFO] root - Output and logs will be saved to ../logdev
04/23 20:06:22 [INFO] cvhelpers.misc - Command: test.py --dev --resume ../trained_models/3dmatch/ckpt/model-best.pth --benchmark 3DMatch
04/23 20:06:22 [INFO] cvhelpers.misc - Source is from Commit 64e5b3f (2022-03-28): Fixed minor typo in Readme.md and demo.py
04/23 20:06:22 [INFO] cvhelpers.misc - Arguments: benchmark: 3DMatch, config: None, logdir: ../logs, dev: True, name: None, num_workers: 0, resume: ../trained_models/3dmatch/ckpt/model-best.pth
04/23 20:06:22 [INFO] root - Using config file from checkpoint directory: ../trained_models/3dmatch/config.yaml
04/23 20:06:22 [INFO] data_loaders.threedmatch - Loading data from ../data/indoor
04/23 20:06:22 [INFO] RegTR - Instantiating model RegTR
04/23 20:06:22 [INFO] RegTR - Loss weighting: {'overlap_5': 1.0, 'feature_5': 0.1, 'corr_5': 1.0, 'feature_un': 0.0}
04/23 20:06:22 [INFO] RegTR - Config: d_embed:256, nheads:8, pre_norm:True, use_pos_emb:True, sa_val_has_pos_emb:True, ca_val_has_pos_emb:True
04/23 20:06:25 [INFO] CheckPointManager - Loaded models from ../trained_models/3dmatch/ckpt/model-best.pth
0%| | 0/1623 [00:00<?, ?it/s] ** On entry to cusparseSpMM_bufferSize() parameter number 1 (handle) had an illegal value: bad initialization or already destroyed

Traceback (most recent call last):
File "test.py", line 75, in
main()
File "test.py", line 71, in main
trainer.test(model, test_loader)
File "/home/lileixin/work/Point_Registration/RegTR/src/trainer.py", line 204, in test
test_out = model.test_step(test_batch, test_batch_idx)
File "/home/lileixin/work/Point_Registration/RegTR/src/models/generic_reg_model.py", line 132, in test_step
pred = self.forward(batch)
File "/home/lileixin/work/Point_Registration/RegTR/src/models/regtr.py", line 117, in forward
kpconv_meta = self.preprocessor(batch['src_xyz'] + batch['tgt_xyz'])
File "/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lileixin/work/Point_Registration/RegTR/src/models/backbone_kpconv/kpconv.py", line 489, in forward
pool_p, pool_b = batch_grid_subsampling_kpconv_gpu(
File "/home/lileixin/work/Point_Registration/RegTR/src/models/backbone_kpconv/kpconv.py", line 232, in batch_grid_subsampling_kpconv_gpu
sparse_tensor = ME.SparseTensor(
File "/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiSparseTensor.py", line 275, in init
coordinates, features, coordinate_map_key = self.initialize_coordinates(
File "/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiSparseTensor.py", line 338, in initialize_coordinates
features = spmm_avg.apply(self.inverse_mapping, cols, size, features)
File "/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 183, in forward
result, COO, vals = spmm_average(
File "/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/sparse_matrix_functions.py", line 93, in spmm_average
result, COO, vals = MEB.coo_spmm_average_int32(
RuntimeError: CUSPARSE_STATUS_INVALID_VALUE at /home/lileixin/MinkowskiEngine/src/spmm.cu:590
(RegTR) ➜ src git:(main) ✗ python test.py --dev --resume ../trained_models/3dmatch/ckpt/model-best.pth --benchmark 3DMatch

/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
/home/lileixin/anaconda3/envs/RegTR/lib/python3.8/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

But when I cross out this line of code, the program can run.
sparse_tensor = ME.SparseTensor( features=points, coordinates=coord_batched, #quantization_mode=ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE )

The text was updated successfully, but these errors were encountered:

yewzijian · 2022-04-25T13:39:16Z

Hi, I'm using the sparse tensors from Minkowski Engine as a quick way to perform the downsampling. Your issue might be due to a bug in your version of Minkowski Engine, however I am not able to replicate it on my machine even with a fresh environment.

I provide my installation code here which hopefully can help (I just tested this on PyTorch 1.10.0 with cuda 11.1):

conda install openblas-devel -c anaconda
export CUDA_HOME=/usr/local/cuda-11.1
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" --install-option="--blas=openblas"

Alternatively, if you cannot resolve the issue with Minkowski Engine, you can use the (slower) CPU version of the processing code by replacing the processing code with its CPU version. You can do so by replacing PreprocessorGPU with Preprocessor here, taking care to change the imports properly, and compiling the helper files using models/backbone_kpconv/compile_wrappers.sh.

Zi Jian

tranceok · 2022-04-26T08:10:12Z

Hi,I changed to the CPU version, and it works. Thank you!

caijillx · 2022-04-26T11:36:00Z

When I upgraded pytorch version to 1.10.0，it works. Thank you!

qsisi · 2022-04-26T12:31:28Z

When I upgraded pytorch version to 1.10.0，it works. Thank you!

Hello,

Did you mean you upgraded the pytorch version -> 1.10.0, and the Mink-based down-sampling works fine without error?

Thanks.

caijillx · 2022-04-26T12:42:26Z

yes! upgraded pytorch1.10.0, install MinkowskiEngine and pytorch3d again.

yewzijian · 2022-04-26T12:45:46Z

Just to clarify, the code should work on PyTorch 1.9.1 (which was used in the actual training). I used v1.10 because I had problems using v1.9 for PyTorch3D installation: I couldn't get the conda installation working and the wheels are only available for PyTorch 1.10.

Zi Jian

qsisi · 2022-04-27T02:11:57Z

yes! upgraded pytorch1.10.0, install MinkowskiEngine and pytorch3d again.

Hello!

I managed to upgrade the pytorch version to 1.10.0, more specifically, I tried two combinations of pytorch+Mink versions:

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

Unfortunately, the error caused by the quantization_mode still exists. Could you provide your OS&gcc&pytorch&Mink settings? Which might give me some clues about how to solve the problem.

Thanks.

caijillx · 2022-04-27T11:02:42Z

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

yes! upgraded pytorch1.10.0, install MinkowskiEngine and pytorch3d again.

Hello!

I managed to upgrade the pytorch version to 1.10.0, more specifically, I tried two combinations of pytorch+Mink versions:

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

Unfortunately, the error caused by the quantization_mode still exists. Could you provide your OS&gcc&pytorch&Mink settings? Which might give me some clues about how to solve the problem.

Thanks.

huk112739 · 2022-06-01T11:39:21Z

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

yes! upgraded pytorch1.10.0, install MinkowskiEngine and pytorch3d again.

Hello!
I managed to upgrade the pytorch version to 1.10.0, more specifically, I tried two combinations of pytorch+Mink versions:

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

Unfortunately, the error caused by the quantization_mode still exists. Could you provide your OS&gcc&pytorch&Mink settings? Which might give me some clues about how to solve the problem.
Thanks.

I have used the above version, but still have problems.

Has your problem been solved?

Thanks.

qsisi · 2022-08-14T09:22:02Z

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

yes! upgraded pytorch1.10.0, install MinkowskiEngine and pytorch3d again.

Hello!
I managed to upgrade the pytorch version to 1.10.0, more specifically, I tried two combinations of pytorch+Mink versions:

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

Unfortunately, the error caused by the quantization_mode still exists. Could you provide your OS&gcc&pytorch&Mink settings? Which might give me some clues about how to solve the problem.
Thanks.

I have used the above version, but still have problems.

Has your problem been solved?

Thanks.

Yes, I solve the problem under these configurations:

ubuntu18.04 + python3.8 + pytorch1.10.1 + CUDA11.3 + Mink0.5.4 + pytorch3d0.6.0

yewzijian mentioned this issue Aug 1, 2022

Train BUG, please help me #7

Open

yewzijian mentioned this issue Jan 6, 2023

A CUDA Error #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A sparse tensor bug #1

A sparse tensor bug #1

caijillx commented Apr 23, 2022

yewzijian commented Apr 25, 2022

tranceok commented Apr 26, 2022

caijillx commented Apr 26, 2022

qsisi commented Apr 26, 2022

caijillx commented Apr 26, 2022

yewzijian commented Apr 26, 2022

qsisi commented Apr 27, 2022

caijillx commented Apr 27, 2022

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

huk112739 commented Jun 1, 2022

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

qsisi commented Aug 14, 2022

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

A sparse tensor bug #1

A sparse tensor bug #1

Comments

caijillx commented Apr 23, 2022

yewzijian commented Apr 25, 2022

tranceok commented Apr 26, 2022

caijillx commented Apr 26, 2022

qsisi commented Apr 26, 2022

caijillx commented Apr 26, 2022

yewzijian commented Apr 26, 2022

qsisi commented Apr 27, 2022

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

caijillx commented Apr 27, 2022

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

huk112739 commented Jun 1, 2022

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4

qsisi commented Aug 14, 2022

OS:Ubuntu18.04

gcc:7.5.0

python:3.9.7

pytorch+Mink:1.10.0 + 0.5.4

cuda version:11.1

OS: ubuntu 16.04

gcc: 7.3.0

pytorch+Mink: 1.7.0 + 0.5.2 or 1.10.0 + 0.5.4