-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell-level caching #89
Comments
Heya, Well the key problem (and for jupyter/nbclient#248) is what would you cache? a = 1 b = 2 c = a + b You can't run from cell 3, unless you've cached (and reloaded) the variables I don't know of an easy way to do this robustly? |
Ye, sorry for the (duplicate?) issue. I looked into nbclient and it seemed like it might be something needed to be done in nbclient, though I'm not totally sure. I didn't see a method to skip the cache execution. Ye that's true. I gave it a thought today and it might be done by serializing the entire kernel by dill. It has a function for that: To make the caching efficient, we could make something like a check-point system: a checkpoint could be made after nth cell that would serialize the entire state. Each checkpoint would have a hash made of cell's source codes up to this cell. If any of the source codes changed, the cache could be invalidated. Further optimization could be made to prevent cache from being so memory hungry, such as:
The main things that need to be imo tested:
I'm very interested in your opinion about these suggestions. I could also potentially help with some of the tasks. |
@chrisjsewell Will you accept PR if someone manages to come up with a good solution? (probably taking some inspiration from knitr) |
Heya yeh definitely interested thanks |
Context
I work with notebooks that are hard to compute.
If I change one code cell at the end of the notebook, I do not expect that its entire cache will be invalidated and the notebook will be needed to be recomputed.
I want only the dependent cells to be changed.
Proposal
Assumption: Notebooks are executed from top to bottom (I come from Quarto). If we work directly with Jupyter notebooks, imho we don't need caching like this. Jupyter does it pretty well on its own. I don't know if this package attempts to cover this case as well.
I'd propose a cell-level cache.
We would remember each cell individually.
If its source code or output of that cell changes, we would only recompute the changed cell and all cells that came after.
This would greatly improve performance when prototyping a notebook because we would only recompute dependent cells.
I assume there is like 100 problems that I don't see. If you see any, please fill me in. It's also possible that this is a problem of Quarto and I missjudgedmisjudged the scope of this project.
Tasks and updates
Later if the proposed solution is viable.
The text was updated successfully, but these errors were encountered: