You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run p2ch12.train --epochs 10 on Centos 8 with nvidia RTX card..
I get the following error from tensorflow.. see last two outputs..below
[***********] dlwpt-code]$ python3.9 -m p2ch12.training --epochs 10
2021-12-28 17:20:18,410 INFO pid:970515 main:127:initModel Using CUDA; 1 devices.
2021-12-28 17:20:21,214 INFO pid:970515 main:188:main Starting LunaTrainingApp, Namespace(batch_size=32, num_workers=8, epochs=10, balanced=False, augmented=False, augment_flip=False, augment_offset=False, augment_scale=False, augment_rotate=False, augment_noise=False, tb_prefix='p2ch12', comment='dlwpt')
2021-12-28 17:20:23,860 INFO pid:970515 p2ch12.dsets:266:init <p2ch12.dsets.LunaDataset object at 0x7f3f44b4eee0>: 51244 training samples, 51135 neg, 109 pos, unbalanced ratio
2021-12-28 17:20:23,864 INFO pid:970515 p2ch12.dsets:266:init <p2ch12.dsets.LunaDataset object at 0x7f3f44b4ef40>: 5694 validation samples, 5681 neg, 13 pos, unbalanced ratio
2021-12-28 17:20:23,865 INFO pid:970515 main:195:main Epoch 1 of 10, 1602/178 batches of size 32*1
2021-12-28 17:20:23,865 WARNING pid:970515 util.util:219:enumerateWithEstimate E1 Training ----/1602, starting
2021-12-28 17:22:10,310 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 64/1602, done at 2021-12-28 18:03:39, 0:43:01
2021-12-28 17:26:46,141 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 256/1602, done at 2021-12-28 17:59:54, 0:39:16
2021-12-28 17:45:02,250 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 1024/1602, done at 2021-12-28 17:58:53, 0:38:15
2021-12-28 17:58:23,593 WARNING pid:970515 util.util:249:enumerateWithEstimate E1 Training ----/1602, done at 2021-12-28 17:58:23 2021-12-28 17:58:24.056498: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-28 17:58:24.056539: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The text was updated successfully, but these errors were encountered:
jsgrover
changed the title
Part 2 Chapter 11,12 issue
Part 2 Chapter 12 issue
Dec 29, 2021
Not exactly sure how you get a TensorFlow error with PyTorch but I'm guessing that's from tensorboard?
As far as not finding the libcudart.so.11.0 , please check your CUDA version. If you are running in a container it would be in /usr/local. You should see /usr/local/cuda-11.0 and a symbolic link to it from /usr/local/cuda. If bare-metal that would depend on how you installed CUDA. Often times if you install a framework (oops PyTorch isn't a framework, it's a library...) from a pre-built binary you can get conflicts if you don't have the CUDA version that the binary was built with.
When I run p2ch12.train --epochs 10 on Centos 8 with nvidia RTX card..
I get the following error from tensorflow.. see last two outputs..below
[***********] dlwpt-code]$ python3.9 -m p2ch12.training --epochs 10
2021-12-28 17:20:18,410 INFO pid:970515 main:127:initModel Using CUDA; 1 devices.
2021-12-28 17:20:21,214 INFO pid:970515 main:188:main Starting LunaTrainingApp, Namespace(batch_size=32, num_workers=8, epochs=10, balanced=False, augmented=False, augment_flip=False, augment_offset=False, augment_scale=False, augment_rotate=False, augment_noise=False, tb_prefix='p2ch12', comment='dlwpt')
2021-12-28 17:20:23,860 INFO pid:970515 p2ch12.dsets:266:init <p2ch12.dsets.LunaDataset object at 0x7f3f44b4eee0>: 51244 training samples, 51135 neg, 109 pos, unbalanced ratio
2021-12-28 17:20:23,864 INFO pid:970515 p2ch12.dsets:266:init <p2ch12.dsets.LunaDataset object at 0x7f3f44b4ef40>: 5694 validation samples, 5681 neg, 13 pos, unbalanced ratio
2021-12-28 17:20:23,865 INFO pid:970515 main:195:main Epoch 1 of 10, 1602/178 batches of size 32*1
2021-12-28 17:20:23,865 WARNING pid:970515 util.util:219:enumerateWithEstimate E1 Training ----/1602, starting
2021-12-28 17:22:10,310 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 64/1602, done at 2021-12-28 18:03:39, 0:43:01
2021-12-28 17:26:46,141 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 256/1602, done at 2021-12-28 17:59:54, 0:39:16
2021-12-28 17:45:02,250 INFO pid:970515 util.util:236:enumerateWithEstimate E1 Training 1024/1602, done at 2021-12-28 17:58:53, 0:38:15
2021-12-28 17:58:23,593 WARNING pid:970515 util.util:249:enumerateWithEstimate E1 Training ----/1602, done at 2021-12-28 17:58:23
2021-12-28 17:58:24.056498: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-28 17:58:24.056539: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The text was updated successfully, but these errors were encountered: