-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further improvements to pending kernels managment #732
Conversation
This seems reasonable to me. Zach and I discussed this today and agreed that we should release a version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Zsailer - this looks good. I just had a question regarding the now
method in the tests.
It might be good to also update the help-string of the ready
property. Perhaps something like:
"""A future that resolves when the kernel has completed its startup or shutdown"""
"""Use this function ensure that this awaitable | ||
happens before other awaitables defined after it. | ||
""" | ||
(out,) = await asyncio.gather(awaitable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand why this method is necessary - especially for a single-item gather. Isn't awaiting the awaitable sufficient to prevent the execution of follow-on awaitables in the same execution path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, @kevin-bates.
That's what's supposed to happen; however, I'm seeing some weird behavior that I think is coming from the gen_test
decorator. Basically, I'm not seeing the await
s happening in the order they are defined.
Weirdly, if I replace the async/await syntax with the old yield
syntax, these tests work fine (no now
needed). Again, this suggest some weird interference happening from Tornado's async testing. I'm not sure how to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By enforcing a gather
call in each await statement, everything happens in the order they are called. Super strange!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Zach. That is interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like nested coroutines can be a bit difficult in unit tests. It appears that inner coroutines never get scheduled even when the outer coroutine is await
ed. Adding ensure_future
in multiple places in the tests ensures that these coroutines get scheduled and tests pass again.
It looks like all tests are (finally) passing except the downstream tests. For that, jupyter-server/jupyter_server#654 is required. I'll work on getting that merged and then ask for a final review here. |
I've also added some documentation for pending kernels to this PR |
This PR evolved a bit. I've added documentation to this PR to define more clearly what a "pending" kernel is. A pending kernel is a kernel that is currently in the process of "starting" or "shutting down". The multikernelmanager will block any subsequent actions that attempt to affect a pending kernel—e.g. "interrupt", "restart", "shutdown"—by raising a Remember, pending kernels are opt-in, so this feature is not enabled by default. |
I'm not sure why the |
I kicked the downstream tests in #731 (sorry @blink1073 😅 ) and see the exact same errors, so they are unrelated to this PR. This should be ready to merge. 🚀 |
The nbclient failures should be fixed by jupyter/nbclient#190. |
…ore responsibility
…sn't start properly
2285fcd
to
da3e79d
Compare
nbclient v0.5.10 is on PyPI with the fix. |
Thanks, @davidbrochart! I really appreciate it! |
Waiting on jupyter-server/jupyter_server#662 to be merged and released to fix the downstream tests. |
All green! |
This was discussed extensively at the last Jupyter Server meeting: jupyter-server/team-compass#15 (comment) Merging away! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a couple of suggestions and comments, but this looks good.
|
||
*Added in 7.1.0* | ||
|
||
In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions. | |
In scenarios where a kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions. |
@property | ||
def _starting_kernels(self): | ||
"""A shim for backwards compatibility.""" | ||
return self._pending_kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be marked as deprecated?
if self._using_pending_kernels() and kernel_id in self._pending_kernels: | ||
raise RuntimeError("Kernel is in a pending state. Cannot shutdown.") | ||
# If the kernel is still starting, wait for it to be ready. | ||
elif kernel_id in self._starting_kernels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the use of _starting_kernels
intentional? Technically, this could also contain kernels that are shutting down now, but only when pending kernels are enabled - so I suspect this was for hinting that, in this case, the kernel can only be starting.
Perhaps we could be more explicit (and save an existence check [nit]) via:
if kernel_id in self._pending_kernels:
if self._using_pending_kernels():
raise RuntimeError("Kernel is in a pending state. Cannot shutdown.")
else: # kernel is still starting, wait for its startup
kernel = self._pending_kernels[kernel_id]
try:
await kernel
except Exception:
self.remove_kernel(kernel_id)
try: | ||
await self._starting_kernels[kernel_id] | ||
await kernel | ||
except Exception: | ||
self.remove_kernel(kernel_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to update the _pending_kernels
list as well here?
Hmm - sorry, it looks like this was merged while I was reviewing. The comments aren't earth-shattering so I'll leave it to you to determine if they have any merit. Thanks for all the work on this @Zsailer - good stuff! |
Looks like the The check is defined here: Pinning to Reporting here for reference. |
Two changes in this PR, summarized below. This work is a follow-up to #712
Make
shutdown_kernel
a pending stateMakes
shutdown_kernel
also show the kernel in a pending state. Since the KernelManager is still managing a process while it's shutting down, which might take a long time, I think this should show as a pending state too.Make
KernelManager
only responsible for reporting kernel pending stateAlso, after working with pending kernels a bit, I believe it makes the most sense to make the
KernelManager
responsible for reporting the kernel's pending state, while the layer that sits above the KernelManager, e.g.MultiKernelManager
, responsible for reacting to that state.This essentially means removing all
self.ready
checks in the individual KernelManager. For example,jupyter_client/jupyter_client/manager.py
Lines 455 to 457 in 3082366
should be removed; rather, a MultiKernelManager whose
use_pending_kernels
attribute isTrue
would determine if shutdown is a passthrough, etc.Another example is
jupyter_client/jupyter_client/manager.py
Lines 506 to 507 in 3082366
The MultiKernelManager would be responsible for handling what to do when a kernel restart happens while a kernel is in a pending state.