Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel sometimes dies with notebook executor #2387

Open
Eric-Arellano opened this issue Nov 25, 2024 · 3 comments · Fixed by #2464
Open

Kernel sometimes dies with notebook executor #2387

Eric-Arellano opened this issue Nov 25, 2024 · 3 comments · Fixed by #2464

Comments

@Eric-Arellano
Copy link
Collaborator

For example https://github.com/Qiskit/documentation/actions/runs/12015838807/job/33494742907

task: <Task finished name='Task-35' coro=<execute_notebook() done, defined at /home/runner/work/documentation/documentation/scripts/nb-tester/qiskit_docs_notebook_tester/__init__.py:253> exception=DeadKernelError('Kernel died')>
Traceback (most recent call last):
  File "/home/runner/work/documentation/documentation/scripts/nb-tester/qiskit_docs_notebook_tester/__init__.py", line 268, in execute_notebook
    nb = await _execute_notebook(path, config, working_directory.name)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/documentation/documentation/scripts/nb-tester/qiskit_docs_notebook_tester/__init__.py", line 346, in _execute_notebook
    await notebook_client.async_execute()
  File "/home/runner/work/documentation/documentation/.tox/py311/lib/python3.11/site-packages/nbclient/client.py", line 709, in async_execute
    await self.async_execute_cell(
@frankharkins
Copy link
Member

frankharkins commented Dec 3, 2024

I've seen this a few more times while working on refactoring the notebook tester. We can't tell which notebook it is from the logs as the error doesn't contain any defining information and the notebooks all run asynchronously. We can whittle it down if jobs fail while only running a subset of notebooks.

In this past, this kind of thing has often been related to Aer. For example, Qiskit/qiskit-aer#2232 might be related.

@frankharkins
Copy link
Member

I managed to reproduce a similar problem locally when I added more notebooks to the script.

zmq.error.ZMQError: Too many open files

I fixed this locally by increasing my ulimit to 6000 (ulimit -n 6000). Hopefully we can set this in our action too.

@Eric-Arellano
Copy link
Collaborator Author

@frankharkins let's keep this open until it's been a few weeks of not seeing it to confirm #2464 did fix the issue.

@Eric-Arellano Eric-Arellano reopened this Dec 16, 2024
@Eric-Arellano Eric-Arellano removed the status in Docs Planning Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants