执行 sh train.sh 命令一直报错，你们有遇到吗 #4

Niklalala · 2022-09-02T02:22:19Z

$ sh train.sh
2022-09-02 10:18:02.992340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.602552: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:04.602591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2022-09-02 10:18:04.629016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2022-09-02 10:18:04.629049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2022-09-02 10:18:04.629077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2022-09-02 10:18:04.629087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2022-09-02 10:18:04.629098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2022-09-02 10:18:04.630010: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2022-09-02 10:18:04.630022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2022-09-02 10:18:04.630028: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2022-09-02 10:18:04.630032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-09-02 10:18:04.630287: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-02 10:18:04.630817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-02 10:18:04.630825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2022-09-02 10:18:04.630838: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:07.565402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Namespace(batch_size=8, checkpoint_interval=1, compute_map=False, data_config='config/captcha.data', epochs=100, evaluation_interval=1, gradient_accumulations=2, img_size=416, model_def='config/yolov3-captcha.cfg', multiscale_training=True, n_cpu=8, pretrained_weights='weights/darknet53.conv.74')
Traceback (most recent call last):
File "train.py", line 118, in
for batch_i, (_, imgs, targets) in enumerate(dataloader):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter
return self._get_iterator()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init
w.start()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Niklalala · 2022-09-02T03:22:20Z

$ sh train.sh 2022-09-02 10:18:02.992340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2022-09-02 10:18:04.602552: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-09-02 10:18:04.602591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2022-09-02 10:18:04.629016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 computeCapability: 7.5 coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s 2022-09-02 10:18:04.629049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2022-09-02 10:18:04.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2022-09-02 10:18:04.629077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2022-09-02 10:18:04.629087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2022-09-02 10:18:04.629098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2022-09-02 10:18:04.630010: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2022-09-02 10:18:04.630022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll 2022-09-02 10:18:04.630028: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2022-09-02 10:18:04.630032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2022-09-02 10:18:04.630287: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-09-02 10:18:04.630817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-09-02 10:18:04.630825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 2022-09-02 10:18:04.630838: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-09-02 10:18:07.565402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll Namespace(batch_size=8, checkpoint_interval=1, compute_map=False, data_config='config/captcha.data', epochs=100, evaluation_interval=1, gradient_accumulations=2, img_size=416, model_def='config/yolov3-captcha.cfg', multiscale_training=True, n_cpu=8, pretrained_weights='weights/darknet53.conv.74') Traceback (most recent call last): File "train.py", line 118, in for batch_i, (_, imgs, targets) in enumerate(dataloader): File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter return self._get_iterator() File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init w.start() File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

修改为：num_workers = 0，可以训练了，但又出现如下报错：

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.19 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

执行 sh train.sh 命令一直报错，你们有遇到吗 #4

执行 sh train.sh 命令一直报错，你们有遇到吗 #4

Niklalala commented Sep 2, 2022

Niklalala commented Sep 2, 2022

执行 sh train.sh 命令一直报错，你们有遇到吗 #4

执行 sh train.sh 命令一直报错，你们有遇到吗 #4

Comments

Niklalala commented Sep 2, 2022

Niklalala commented Sep 2, 2022