Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2ch11/training.py causes an TypeError #107

Open
Va6lue opened this issue Mar 14, 2023 · 1 comment
Open

p2ch11/training.py causes an TypeError #107

Va6lue opened this issue Mar 14, 2023 · 1 comment

Comments

@Va6lue
Copy link

Va6lue commented Mar 14, 2023

Due to this error, my TensorBoard shows nothing.

My computer spec:
CPU: AMD R7-7700
RAM: 16GB X 2
GPU: RTX 4090 24GB

My system spec:
OS: Windows 11
IDE: VS Code
Python: 3.9.13
PyTorch: 1.13.1+cu117

In:

#run('p2ch11.prepcache.LunaPrepCacheApp')  # I run this line successfully. Just to say that I have run this line.
run('p2ch11.training.LunaTrainingApp', '--epochs=1')  # I run this line in failure.

Out:

Details

2023-03-14 21:40:08,556 INFO pid:10608 nb:004:run Running: p2ch11.training.LunaTrainingApp(['--epochs=1', '--num-workers=8']).main()
2023-03-14 21:40:08,560 INFO pid:10608 p2ch11.training:079:initModel Using CUDA; 1 devices.
2023-03-14 21:40:08,563 INFO pid:10608 p2ch11.training:138:main Starting LunaTrainingApp, Namespace(num_workers=8, batch_size=1024, epochs=1, tb_prefix='p2ch11', comment='dwlpt')
2023-03-14 21:40:08,724 INFO pid:10608 p2ch11.dsets:182:init <p2ch11.dsets.LunaDataset object at 0x000001EC667766A0>: 495958 training samples
2023-03-14 21:40:08,744 INFO pid:10608 p2ch11.dsets:182:init <p2ch11.dsets.LunaDataset object at 0x000001EC7679AFA0>: 55107 validation samples
2023-03-14 21:40:08,745 INFO pid:10608 p2ch11.training:145:main Epoch 1 of 1, 485/54 batches of size 1024*1
2023-03-14 21:40:08,746 WARNING pid:10608 util.util:144:enumerateWithEstimate E1 Training ----/485, starting
2023-03-14 21:41:21,363 INFO pid:10608 util.util:161:enumerateWithEstimate E1 Training 64/485, done at 2023-03-14 21:47:11, 0:06:37
2023-03-14 21:44:03,276 INFO pid:10608 util.util:161:enumerateWithEstimate E1 Training 256/485, done at 2023-03-14 21:47:15, 0:06:41
2023-03-14 21:47:16,901 WARNING pid:10608 util.util:174:enumerateWithEstimate E1 Training ----/485, done at 2023-03-14 21:47:16
2023-03-14 21:47:18,692 INFO pid:10608 p2ch11.training:259:logMetrics E1 LunaTrainingApp
2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:289:logMetrics E1 trn 0.0235 loss, 99.7% correct,
2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:298:logMetrics E1 trn_neg 0.0041 loss, 100.0% correct (494577 of 494743)
2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:309:logMetrics E1 trn_pos 7.9111 loss, 0.2% correct (2 of 1215)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 run('p2ch11.training.LunaTrainingApp', '--epochs=1')

Cell In[2], line 7, in run(app, *argv)
4 log.info("Running: {}({!r}).main()".format(app, argv))
6 app_cls = importstr(*app.rsplit('.', 1))
----> 7 app_cls(argv).main()
9 log.info("Finished: {}.{!r}).main()".format(app, argv))

File c:\DeepLearning_F1388\F1388_Code\p2ch11\training.py:155, in LunaTrainingApp.main(self)
145 log.info("Epoch {} of {}, {}/{} batches of size {}*{}".format(
146 epoch_ndx,
147 self.cli_args.epochs,
(...)
151 (torch.cuda.device_count() if self.use_cuda else 1),
152 ))
154 trnMetrics_t = self.doTraining(epoch_ndx, train_dl)
--> 155 self.logMetrics(epoch_ndx, 'trn', trnMetrics_t)
157 valMetrics_t = self.doValidation(epoch_ndx, val_dl)
158 self.logMetrics(epoch_ndx, 'val', valMetrics_t)

File c:\DeepLearning_F1388\F1388_Code\p2ch11\training.py:339, in LunaTrainingApp.logMetrics(self, epoch_ndx, mode_str, metrics_t, classificationThreshold)
336 posHist_mask = posLabel_mask & (metrics_t[METRICS_PRED_NDX] < 0.99)
...
--> 386 cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
387 start, end = np.searchsorted(cum_counts, [0, cum_counts[-1] - 1], side="right")
388 start = int(start)

TypeError: No loop matching the specified signature and casting was found for ufunc greater

@Va6lue Va6lue closed this as completed Mar 14, 2023
@Va6lue Va6lue reopened this Mar 14, 2023
@Va6lue
Copy link
Author

Va6lue commented Mar 18, 2023

I change the Python version to 3.7 and the followings are the packages I install.

Details

absl-py==1.4.0
astor==0.8.1
backcall==0.2.0
blosc==1.10.6
cassandra-driver==3.25.0
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.6
cycler==0.11.0
debugpy==1.6.6
decorator==5.1.1
diskcache==4.1.0
entrypoints==0.4
fonttools==4.38.0
gast==0.2.2
geomet==0.2.1.post1
google-pasta==0.2.0
grpcio==1.51.3
h5py==3.8.0
idna==3.4
imageio==2.26.0
importlib-metadata==6.0.0
ipykernel==6.16.2
ipython==7.34.0
jedi==0.18.2
jupyter_client==7.4.9
jupyter_core==4.12.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.4.4
Markdown==3.4.1
MarkupSafe==2.1.2
matplotlib==3.4.0
matplotlib-inline==0.1.6
nest-asyncio==1.5.6
networkx==2.6.3
numpy==1.21.6
opt-einsum==3.3.0
packaging==23.0
parso==0.8.3
pickleshare==0.7.5
Pillow==9.4.0
prompt-toolkit==3.0.38
protobuf==3.20.0
psutil==5.9.4
Pygments==2.14.0
pyparsing==3.0.9
python-dateutil==2.8.2
PyWavelets==1.3.0
pywin32==305
pyzmq==25.0.1
requests==2.28.1
scikit-image==0.15.0
scipy==1.5.0
SimpleITK==2.2.1
six==1.16.0
tensorboard==1.15.0
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
termcolor==2.2.0
torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchvision==0.14.1+cu117
tornado==6.2
traitlets==5.9.0
typing_extensions==4.5.0
urllib3==1.26.13
wcwidth==0.2.6
Werkzeug==2.2.3
wrapt==1.15.0
zipp==3.15.0

The memory explosion occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant