[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

s95huang · 2023-10-27T05:48:25Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.8.18 | packaged by conda-forge | (default, Oct 10 2023, 15:44:36) [GCC 12.3.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda-11.4
NVCC: Cuda compilation tools, release 11.4, V11.4.152
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.12.0+cu116
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.6
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.0+cu116
OpenCV: 4.5.5
MMEngine: 0.9.0
MMDetection: 3.0.0
MMDetection3D: 1.1.0+4ff1361
spconv2.0: True

Reproduces the problem - code sample

'''
python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 0 --extra-tag waymo
'''

Reproduces the problem - command or script

'''
python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 0 --extra-tag waymo
'''

Reproduces the problem - error message

With --workers 0, I have

Start converting ...
Traceback (most recent call last):
  File "tools/create_data.py", line 327, in <module>
    waymo_data_prep(
  File "tools/create_data.py", line 204, in waymo_data_prep
    converter.convert()
  File "/mnt/0c39e9c4-f324-420d-a1e9-f20a41d147a8/personal_repos/LoopX/mmdetection3d/tools/dataset_converters/waymo_converter.py", line 112, in convert
    mmengine.track_parallel_progress(self.convert_one, range(len(self)),
  File "/home/s95huang/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/progressbar.py", line 191, in track_parallel_progress
    pool = init_pool(nproc, initializer, initargs)
  File "/home/s95huang/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/progressbar.py", line 133, in init_pool
    return Pool(process_num)
  File "/home/s95huang/anaconda3/envs/openmmlab/lib/python3.8/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/home/s95huang/anaconda3/envs/openmmlab/lib/python3.8/multiprocessing/pool.py", line 205, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

with worker set to 1/8/10/16 on this i7-13700 or 16/32 on AMD threadripper 2950 using mmdetection3d v1.1, 1.3 and main dev-1.x,

The waymo-kitti conversion got stuck at

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 150/150, 0.0 task/s, elapsed: 4830s, ETA:     0s

Finished ...
Start converting ...
completed: 0, elapsed: 0s

Finished ...
created txt files indicating what to collect in  ['training', 'validation', 'testing', 'testing_3d_camera_only_detection']
Generate info. this may take several minutes.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 158081/158081, 75.6 task/s, elapsed: 2090s, ETA:     0s
[                                                  ] 0/158081, elapsed: 0s, ETA:

for days and the system monitor indicates there is no disk read/write with very low CPU usage.
It appears the code is stuck when working on

mmengine.dump

Additional information

If needed,
the waymo dataset is v 1.4.1
waymo toolkit is 2.6, 1.4.9

I also tried the solution in #2371 and #2705 and the error is still there.

This solution #2364 won't work as GPU error out of memory.

I am currentyly testing the conversion in AWS and hope it might work~

The text was updated successfully, but these errors were encountered:

s95huang · 2023-10-29T01:36:50Z

Update:

I used AWS EC2 G5 instance with 32 CPU etc.
The same problem occurs and the SSH connection is dropped after 10 min due to no activity.
I changed limit to 10 hours and the monitoring shows very low usage for 10 hr.

ammaryasirnaich · 2023-10-29T14:45:34Z

@s95huang , How much RAM memory you are using for it ?

s95huang · 2023-10-29T14:47:59Z

@s95huang , How much RAM memory you are using for it ?

For AWS, the RAM is 128 GB or 256GB

My local machine 13700K is 64 GB , docker version is 32GB
Threadripper machine is 64GB as well.

ammaryasirnaich · 2023-10-29T14:51:03Z

hmm, can you try this https://github.com/DYZhang09/SAM3D/issues/5 if it works for you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

s95huang commented Oct 27, 2023

s95huang commented Oct 29, 2023 •

edited

Loading

ammaryasirnaich commented Oct 29, 2023

s95huang commented Oct 29, 2023

ammaryasirnaich commented Oct 29, 2023

[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

[Bug] waymo dataset to kitti conversion stuck at ] 0/158081, elapsed: 0s, ETA: #2796

Comments

s95huang commented Oct 27, 2023

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

s95huang commented Oct 29, 2023 • edited Loading

ammaryasirnaich commented Oct 29, 2023

s95huang commented Oct 29, 2023

ammaryasirnaich commented Oct 29, 2023

s95huang commented Oct 29, 2023 •

edited

Loading