-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crack detection: runtime errors with train_model() #104
Comments
The PID killed by signal error is a catch all that just indicates code is
not running. Sorry I know it doesn't help. From my previous experience I
strongly suspect it can be a dependency issue. Try setting up an envtt with
the latest pytorch version and trying again
…On Wed, Apr 1, 2020 at 5:11 PM Mike Fuller ***@***.***> wrote:
RE the Jupyter file for the *crack detection* project: I'm get runtime
errors at cell [34], when I try to train the model. It seems to have
something to do with signal handling. The last item in the error hierarchy
is:
RuntimeError: DataLoader worker (pid 83316) is killed by signal: Unknown
signal: 0.
To simplify debugging, I tried running it with *zero epochs*. Here are
the error statements generated when I do that.
(I also found that I needed to add a line to import torchsummary, and
move %matplotlib inline to the top of the import list to overcome other
errors.)
This is on a Mac (OSX 10.15.4) with Python 3.7.6 and pytorch 1.4.0
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-34-51af14cba900> in <module>
1 base_model = train_model(resnet50, criterion, optimizer, exp_lr_scheduler, num_epochs=0)
----> 2 visualize_model(base_model)
3 plt.show()
<ipython-input-25-8be992550be9> in visualize_model(model, num_images)
6
7 with torch.no_grad():
----> 8 for i, (inputs, labels) in enumerate(dataloaders['val']):
9 inputs = inputs.to(device)
10 labels = labels.to(device)
~/opt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __iter__(self)
277 return _SingleProcessDataLoaderIter(self)
278 else:
--> 279 return _MultiProcessingDataLoaderIter(self)
280
281 @Property
~/opt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
744 # prime the prefetch loop
745 for _ in range(2 * self._num_workers):
--> 746 self._try_put_index()
747
748 def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):
~/opt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_put_index(self)
870 return
871
--> 872 self._index_queues[worker_queue_idx].put((self._send_idx, index))
873 self._task_info[self._send_idx] = (worker_queue_idx,)
874 self._tasks_outstanding += 1
~/opt/anaconda3/lib/python3.7/multiprocessing/queues.py in put(self, obj, block, timeout)
85 with self._notempty:
86 if self._thread is None:
---> 87 self._start_thread()
88 self._buffer.append(obj)
89 self._notempty.notify()
~/opt/anaconda3/lib/python3.7/multiprocessing/queues.py in _start_thread(self)
157
158 # Start thread which transfers data from buffer to pipe
--> 159 self._buffer.clear()
160 self._thread = threading.Thread(
161 target=Queue._feed,
~/opt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
64 # This following call uses `waitid` with WNOHANG from C side. Therefore,
65 # Python can still get and update the process status successfully.
---> 66 _error_if_any_worker_fails()
67 if previous_handler is not None:
68 previous_handler(signum, frame)
RuntimeError: DataLoader worker (pid 83316) is killed by signal: Unknown signal: 0.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#104>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFA4MODERQUDABOUJMOE2QLRKOUWLANCNFSM4LZSO6NA>
.
|
Okay, thanks for the quick response! I will give it a try. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
RE the Jupyter file for the crack detection project: I'm get runtime errors at cell [34], when I try to train the model. It seems to have something to do with signal handling. The last item in the error hierarchy is:
RuntimeError: DataLoader worker (pid 83316) is killed by signal: Unknown signal: 0.
To simplify debugging, I tried running it with zero epochs. Here are the error statements generated when I do that.
(I also found that I needed to add a line for
import torchsummary
, and move%matplotlib inline
to the top of the import list to overcome other errors.)This is on a Mac (
OSX 10.15.4
) withPython 3.7.6
andpytorch 1.4.0
The text was updated successfully, but these errors were encountered: