-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
执行 sh train.sh 命令一直报错,你们有遇到吗 #4
Comments
修改为:num_workers = 0,可以训练了,但又出现如下报错: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.19 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF |
$ sh train.sh
2022-09-02 10:18:02.992340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.602552: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:04.602591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2022-09-02 10:18:04.629016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2022-09-02 10:18:04.629049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-09-02 10:18:04.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2022-09-02 10:18:04.629077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2022-09-02 10:18:04.629087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2022-09-02 10:18:04.629098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2022-09-02 10:18:04.630010: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2022-09-02 10:18:04.630022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2022-09-02 10:18:04.630028: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2022-09-02 10:18:04.630032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-09-02 10:18:04.630287: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-02 10:18:04.630817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-02 10:18:04.630825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2022-09-02 10:18:04.630838: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-02 10:18:07.565402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Namespace(batch_size=8, checkpoint_interval=1, compute_map=False, data_config='config/captcha.data', epochs=100, evaluation_interval=1, gradient_accumulations=2, img_size=416, model_def='config/yolov3-captcha.cfg', multiscale_training=True, n_cpu=8, pretrained_weights='weights/darknet53.conv.74')
Traceback (most recent call last):
File "train.py", line 118, in
for batch_i, (_, imgs, targets) in enumerate(dataloader):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter
return self._get_iterator()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init
w.start()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
The text was updated successfully, but these errors were encountered: