Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行 sh train.sh 命令一直报错,你们有遇到吗 #4

Open
Niklalala opened this issue Sep 2, 2022 · 1 comment
Open

执行 sh train.sh 命令一直报错,你们有遇到吗 #4

Niklalala opened this issue Sep 2, 2022 · 1 comment

Comments

@Niklalala
Copy link

$ sh train.sh
2022-09-02 10:18:02.992340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.602552: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:04.602591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2022-09-02 10:18:04.629016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2022-09-02 10:18:04.629049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2022-09-02 10:18:04.629077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2022-09-02 10:18:04.629087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2022-09-02 10:18:04.629098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2022-09-02 10:18:04.630010: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2022-09-02 10:18:04.630022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2022-09-02 10:18:04.630028: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2022-09-02 10:18:04.630032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-09-02 10:18:04.630287: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-02 10:18:04.630817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-02 10:18:04.630825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2022-09-02 10:18:04.630838: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:07.565402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Namespace(batch_size=8, checkpoint_interval=1, compute_map=False, data_config='config/captcha.data', epochs=100, evaluation_interval=1, gradient_accumulations=2, img_size=416, model_def='config/yolov3-captcha.cfg', multiscale_training=True, n_cpu=8, pretrained_weights='weights/darknet53.conv.74')
Traceback (most recent call last):
File "train.py", line 118, in
for batch_i, (_, imgs, targets) in enumerate(dataloader):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter
return self._get_iterator()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init
w.start()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

@Niklalala
Copy link
Author

$ sh train.sh 2022-09-02 10:18:02.992340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2022-09-02 10:18:04.602552: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-09-02 10:18:04.602591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2022-09-02 10:18:04.629016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 computeCapability: 7.5 coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s 2022-09-02 10:18:04.629049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2022-09-02 10:18:04.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2022-09-02 10:18:04.629077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2022-09-02 10:18:04.629087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2022-09-02 10:18:04.629098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2022-09-02 10:18:04.630010: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2022-09-02 10:18:04.630022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll 2022-09-02 10:18:04.630028: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2022-09-02 10:18:04.630032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2022-09-02 10:18:04.630287: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-09-02 10:18:04.630817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-09-02 10:18:04.630825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 2022-09-02 10:18:04.630838: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-09-02 10:18:07.565402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll Namespace(batch_size=8, checkpoint_interval=1, compute_map=False, data_config='config/captcha.data', epochs=100, evaluation_interval=1, gradient_accumulations=2, img_size=416, model_def='config/yolov3-captcha.cfg', multiscale_training=True, n_cpu=8, pretrained_weights='weights/darknet53.conv.74') Traceback (most recent call last): File "train.py", line 118, in for batch_i, (_, imgs, targets) in enumerate(dataloader): File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter return self._get_iterator() File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init w.start() File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe

修改为:num_workers = 0,可以训练了,但又出现如下报错:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.19 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant